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L REAL PARTY IN INTEREST 



The real party in interest for the above-identified patent application on Appeal is 

Dendreon Corporation 

by virtue of an Assignment recorded May 20, 2002 at reel 014703, frame 0441 in the United 



States Patent and Trademark Office. 
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II. RELATED APPEALS AND INTERFERENCES 

Appellant's legal representative and the Assignee of the above-identified patent 
application do not know of any prior or pending appeals, interferences or judicial proceedings 
that may be related to, directly affect or be directly affected by or have a bearing on the 
Board's decision with respect to the above-identified Appeal. 
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III. STATUS OF CLAIMS 
Claims 1, 10-13,20, 34-36, 40-46, 48-55, 108, 109, 113-116, 118-120 and 122-126 
are pending in the above-identified patent application. Claims 10, 43-46, 48-55, 108, 109, 
1 15, 1 16, 1 18-120 and 122-126 are withdrawn fi-om consideration, but are retained for 
possible rejoinder upon allowance of a generic claim. Claims 1, 11-13, 20, 34-36, 40-42, 113 
and 114 are rejected. Therefore, Claims 1, 1 1-13, 20, 34-36, 40-42, 1 13 and 1 14 are the 
subject of this appeal. A copy of the appealed claims, and all pending claims, is included in 
the Claims Appendix. 
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IV, STATUS OF AMENDMENTS 

No amendment was filed subsequent to the final rejection. Appellant filed a Notice of 
Appeal on August 14, 2008 (mailed on that date via Express mail certificate of mailing). 

Appellant attaches a copy of the Final Office Action as Exhibit 1 in the Evidence 
Appendix. 
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V. SUMMARY OF CLAIMED SUBJECT MATTER 

The following is a brief discussion of subject matter of the claimed subject matter. As 
described and defined in the application (see, e.g., page 7, last paragraph- page 8; and page 
18, line 13, - page 19). Transmembrane serine protease (hereinafter MTSPs) are a known 
family of serine proteases. Their identity and sequences are known, and, the prior art teaches 
that these proteases require activation and cleavage for activity. The active form is typically 
a two chain or other multi-chain form. There is no teaching or suggestion in any art, that 
isolated protease domains of the protease as a single chain has activity, nor is there any 
teaching or suggestion for isolating such domain. Independent claim 1 is directed to isolated 
single chain protease domains of an MTSP that are modified by replacing a firee cysteine with 
another amino acid; all claims are dependent thereon. The free cysteine in the protease 
domain, is not free in the activated full-length molecule. Modification of the single chain 
protease domain by replacing the free cysteine prevents aggregation that occurs by virtue of 
interaction among the free cysteines among molecules. Since none of the art suggests that 
the isolated protease domain has activity, none can suggest modifying the isolated protease 
domain to avoid aggregation which will impact on activity. 

As defined in the application (pages 18-20), an MTSP family member is: 

As used herein, "transmembrane serine protease (MTSP)" refers to a family of 
transmembrane serine proteases that share common structural features as described 
herein (see, also Hooper et al. (2001) J. Biol. Chem. 276:857-860). Thus, 
reference, for example, to "MTSP" encompasses all proteins encoded by the MTSP 
gene family, including but are not limited to: MTSPl, MTSP3, MTSP4 and 
MTSP6, or an equivalent molecule obtained from any other source or that has been 
prepared synthetically or that exhibits the same activity. Other MTSPs include, but 
are not limited to, conn, enteropeptidase, human airway trypsin-like protease 
(HAT), MTSPl, TMPRSS2, and TMPRSS4. Sequences of encoding nucleic 
molecules and the encoded amino acid sequences of exemplary MTSPs and/or 
domains thereof are set forth in SEQ ID Nos. 1-12, 49, 50 and 61-72. The term also 
encompass MTSPs with conservative amino acid substitutions that do not 
substantially alter activity of each member, and also encompasses splice variants 
thereof. Suitable conservative substitutions of amino acids are known to those of 
skill in this art and may be made generally without altering the biological activity of 
the resulting molecule. Of particular interest are MTSPs of mammalian, including 
human, origin. Those of skill in this art recognize that, in general, single amino 
acid substitutions in non-essential regions of a polypeptide do not substantially alter 
biological activity (see, e.g., Watson et al. Molecular Biology of the Gene, 4th 
Edition, 1987, The Benjamin/Cummings Pub. Co., p. 224). 

The application identifies the known members of the family: corin, enteropeptidase, human 
airway trypsin-like protease (HAT), hepsin, MTSPl, TMPRSS2, TMPRSS4 and TADG-12), 
and provides sequences of numerous family members and also provides new family members 
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{e.g., MTSP3, MTSP4 and MTSP6). Pages 10-12 reference sequence identifiers and or 
references providing the sequences of each member of the family: 

. . . corin (accession nos. AFl 33845 and AB013874; see, Yan et al. (1999) J. 
Biol. Chem. 274:14926-14938; Tomita et al. (1998) J. Biochem. 124:784-789; Uan 
et al. (2000) Proc, Natl. Acad. Sci. U.S.A. 97:8525-8529; SEQ ED Nos. 61 and 62 
for the human protein); enteropeptidase (also designated enterokinase; accession 
no. U09860 for the human protein; see, Kitamoto et al. (1995) Biochem. 27: 4562- 
4568; Yahagi et al. (1996) Biochem. Biophys. Res. Commun. 219:806-812; 
Kitamoto et al. (1994) Proc. Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et 
al. (1994) J. Biol. Chem. 269:19976-19982; see SEQ ID Nos. 63 and 64 for the 
human protein); human airway trypsin-like protease (HAT; accession no. 
AB002134; see Yamaoka et al. J. Biol. Chem. 273:1 1894-1 1901; SEQ ID Nos. 65 
and 66 for the human protein); hepsin (see, accession nos. Ml 8930, AF030065, 
X70900; Leytus et al. (1988) Biochem. 27: 1 1895-1 1901; Vu et al. (1997) J. Biol. 
Chem. 272:31315-31320; and Farley et al. (1993) Biochem. Biophys. Acta 
1 173:350-352; SEQ ID Nos. 67 and 68 for the human protein); TMPRS2 (see, 
Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et al. (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100; SEQ ID Nos. 69 
and 70 for the human protein) TMPRSS4 (see, Accession No. NM 016425; 
Wallrapp et al. (2000) Cancer 60:2602-2606; SEQ ID Nos. 71 and 72 for the human 
protein); and TADG-12 (also designated MTSP6, see SEQ ID Nos. 1 1 and 12; see 
International PCX application No, WO 00/52044, which claims priority to U.S. 
application Serial No. 09/261,416). 

. . . Exemplary MTSPs (see, e.g., SEQ ID No. 1-12, 49 and 50) are provided 
herein, as are the single chain protease domains thereof as follows: SEQ ID Nos. 1, 
2, 49 and 50 set forth amino acid and nucleic acid sequences of MTSPl and the 
protease domain thereof; SEQ ID No. 3 sets forth the MTSP3 nucleic acid 
sequence and SEQ ID No. 4 the encoded MTSP3 amino acids; SEQ ID No. 5 
MTSP4 a nucleic acid sequence of the protease domain and SEQ ID No. 6 the 
encoded MTSP4 amino acid protease domain; SEQ ID No. 7 MTSP4-L a nucleic 
acid sequence and SEQ ID No. 8 the encoded MTSP4-L amino acid sequence; SEQ 
ED No. 9 an MTSP4-S encoding nucleic acid sequence and SEQ ID No. 10 the 
encoded MTSP4-S amino acid sequence; and SEQ ID No. 1 1 an MTSP6 encoding 
nucleic acid sequence and SEQ ID No. 12 the encoded MTSP6 amino acid 
sequence. The single chain protease domains of each are delineated below. 

As described in the application, and noted above, Appellant has discovered that the 
protease domain as a single chain polypeptide that contains only the protease domain of an 
MTSP protease possesses protease activity. Prior to this the dogma in the protease field was 
that these serine proteases exist as a zymogen that requires activation cleavage for activity. 
Activation cleavage cleaves the disulfide bond that forms between a cysteine residue in the 
protease domain and another domain of the enzyme. As a result of the activation cleavage, 
the active protease occurs as a two-chain or multi-chain molecule. See, e.g., Lin et aL, (J. 
Biol. Chem. 274:18231-18236 (1999), Exhibit 20, which teaches that serine proteases are 
synthesized as single-chain zymogens, which are proteolytically activated to become active 
two-chain forms {e.g., see page 18235, col. 2, first full paragraph); and Takeuchi et al. (Proc. 
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Natl. Acad. Sci. USA 96: 1 1054-11061 (1999), Exhibit 3), which describes the pro-domain 
region of its MTSPl as disulfide bonded to the protease domain (see page 1 1058, col, 1 and 
page 1 1060, col. 1, first paragraph) and remains bonded to the protease domain afl:er auto- 
activation (page 1 1058, lines 8-9), resulting in a polypeptide that includes a protease domain 
disulfide bonded to a pro-domain having a two-chain form. 

The application teaches (see, e.g.^ page 8, lines 15-21; page 20, lines 1-6; page 25, 
line 4 through page 26, line 25; page 58, lines 5-1 1) that the single chain protease domain is 
active. The application also teaches how to identify a protease domain (see, e.g., page 8, 
lines 7-14 and page 19, lines 3-24). For example, at page 18, line 24 through page 20, line 6, 
the specification defines a protease domain of an MTSP as well as the requisites for activity 
and how to identify a protease domain as: 

As used herein, a "protease domain of an MTSP" refers to the protease domain 
of MTSP that is located within the extracellular domain of a MTSP and exhibits 
serine proteolytic activity. It includes at least the smallest fragment thereof that acts 
catalytically as a single chain form. Hence it is at least the minimal portion of the 
extracellular domain that exhibits proteolytic activity as assessed by standard assays 
in vitro assays. Those of skill in this art recognize that such protease domain is the 
portion of the protease that is stmcturally equivalent to the trypsin or chymotrypsin 
fold. 

Exemplary MTSP proteins, with the protease domains indicated, are illustrated 
in Figures 1-3. Smaller portions thereof that retain protease activity are 
contemplated. The protease domains vary in size and constitution, including 
insertions and deletions in surface loops. They retain conserved structure, 
including at least one of the active site triad, primary specificity pocket, 
oxyanion hole and/or other features of serine protease domains of proteases. 
Thus, for purposes herein, the protease domain is a portion of a MTSP, as defined 
herein, and is homologous to a domain of other MTSPs, such as corin, 
enteropeptidase, human airway trypsin-like protease (HAT), MTSPl, TMPRSS2, 
and TMPRSS4, which have been previously identified; it was not recognized, 
however, that an isolated single chain form of the protease domain could function 
proteolytically in in vitro assays. As with the larger class of enzymes of the 
chymotrypsin (SI) fold (see, e.g., Internet accessible MEROPS data base), the 
MTSPs protease domains share a high degree of amino acid sequence identity. 
The His, Asp and Ser residues necessary for activity are present in conserved 
motifs. The activation site, which results in the N-terminus of second chain in the 
two chain forms is has a conserved motif and readily can be identified (see, e.g., 
amino acids 801-806, SEQ ID No. 62, amino acids 406-410, SEQ ID No. 64; amino 
acids 186-190, SEQ ID No. 66; amino acids 161-166, SEQ ID No. 68; amino acids 
255-259, SEQ ID No. 70; amino acids 190-194, SEQ ID No. 72). 

As used herein, the catalytically active domain of an MTSP refers to the 
protease domain 

Signiflcantly, it is shown herein, that, at least in vitroy the single chain 
forms off the MTSPs and the catalytic domains or proteolytically active 
portions thereoff (typically C-terminal truncations) thereoff exhibit protease 
activity. Hence provided herein are isolated single chain forms of the protease 
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domains of MTSPs and their use in in vitro drug screening assays for identification 
of agents that modulate the activity thereof 

The specification teaches modified protease domains (see, e.g., page 11, the description for each 

of the working examples, and the working examples, which describe replacement of the free 

(unpaired) Cys residue in the protease domain): 

Also provided are muteins of the single chain protease domains and MTSPs, 
particularly muteins in which the Cys residue in the protease domain that is free 
(i.e., does not form disulfide linkages with any other Cys residue in the protein) is 
substituted with another amino acid substitution, preferably with a conservative 
amino acid substitution or a substitution that does not eliminate the activity, and 
muteins in which a glycosylation site(s) is eliminated. Muteins in which other 
conservative amino acid substitutions in which catalytic activity is retained are also 
contemplated (see, e.g., Table 1, for exemplary amino acid substitutions). See, also, 
Figure 4, which identifies the free Cys residues in MTSP3, MTSP4 and MTSP6. 

Claims on Appeal and exemplary supporting disclosure in the application 

Claims 1, 11-13, 20, 34-36, 40-42, 113 and 1 14 are the subject of this appeal and each is 
argued separately throughout. Independent Claim 1 is directed to an isolated, substantially 
purified (e.g., see page 46, lines 4-15) single-chain polypeptide, consisting only of a protease 
domain of a type-II membrane-type serine protease (MTSP) (e.g., see page 17, line 24 through 
page 19, line 2 and page 25, line 4-page 26, line 12) or a catalytically active fi-agment thereof 
(e.g., see page 26, lines 13-25) as a single chain (e.g., see page 26, lines 13-25 and 58, lines 5- 
1 1), wherein a free Cys (e.g., see page 10, lines 4-6) in the protease domain is replaced with 
another amino acid (e.g., see page 10, lines 3-13); and the MTSP protease domain or 
catalytically active fi-agment thereof has serine protease activity (e.g., see page 31, lines 14-20) 
as a single chain (e.g., see page 26, lines 13-25 and 58, lines 5-20; original claim 1). All claims 
ultimately depend fi"om claim 1 . 

Dependent claim 1 1 is directed to the substantially purified polypeptide of claim 1, 
wherein the MTSP is selected firom among MTSPl, MTSP3, MTSP4 and MTSP6 (e.g., see 
page 8, line 30 through page 9, line 8 and original claim 11). 

Dependent claim 12 is directed to the substantially purified (e.g., see page 46, lines 4- 
1 5) polypeptide of claim 1 , where the MTSP protease domain consists of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or as amino 
acids 217-443 in SEQ ID No. 12 (e.g., see page 25, lines 22-27 and original claim 12). 

Dependent claim 13 is directed to the substantially purified (e.g., see page 46, lines 4- 
15) polypeptide of claim 1 that has at least about 95% sequence identity with a protease 
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domain consisting of a sequence of amino acid residues selected from among amino acids 
615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the amino acids set forth 
as SEQ ID No. 6, and amino acids 217-443 in SEQ ID No. 12 (e.g., see page 25, lines 22-31 
and original claim 13). 

Dependent claim 20 is directed to the polypeptide of claim 1, where a free Cys in the 
protease domain is replaced with a serine ((e.g^., see page 10, lines 3-13, page 163, lines 4-8 
and original claim 20). 

Dependent claim 34 is directed to the polypeptide of claim 1, where the MTSP is 
selected from among corin, MTSP 1 , enteropeptidase, human airway trypsin-like protease 
(HAT), TMPRSS2, and TMPRSS4 ((e.g., see page 8, line 30 through page 9, line 8 and 
original claim 34). 

Dependent claim 35 is directed to a conjugate (e.g., see page 38, lines 1-8 and page 
123, line 30 through page 136, line 2), that includes a) a polypeptide of claim 1, and b) a 
targeting agent (e.g., see page 38, lines 9-15 and page 130, lines 9-17) linked to the protein 
directly or via a linker (e.g., see page 126, line 9 through page 130, line 7), where the 
conjugate has serine protease activity (e.g., see page 10, lines 3-13 and original claim 35). 

Dependent claim 36 is directed to a conjugate of claim 35, wherein the targeting agent 
permits i) affinity isolation or purification of the conjugate; ii) attachment of the conjugate 
to a surface; iii) detection of the conjugate; or iv) targeted delivery to a selected tissue or cell 
(e.g., see page 14, lines 19-26 and original claim 36). 

Dependent claim 40 is directed to a solid support (e.g., see page 126, lines 12-15) 
comprising two or more polypeptides of claim 1 linked thereto either directly or via a linker 
(e.g., see page 131, line 92 through page 134, line 30 and original claims 39). 

Dependent claim 41 is directed to the solid support of claim 40 and recites that the 
polypeptides comprise an array (e.g., see page 132, lines 4-8 and original claim 40). 

Dependent claim 42 is directed to the solid support of claim 41 and recites that the 
array includes polypeptides having different MTSP protease domains (e.g. , see and original 
claim 41). 

Dependent claim 1 13 is directed to a solid support (e.g., see page 126, lines 12-15) 
comprising two or more polypeptides of claim 12 linked thereto either directly or via a linker 
(e.g., see page 126, line 9 through page 130, line 7 and original claim 112). 
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Claim 114 depends from claim 113 and specifies that the polypeptides comprise an array (e.^., 
see page 132, lines 4-8 and original claim 113). 

A list of the currently pending claims is provided in the Claims Appendix of this 

Brief 
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VI. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

A. Rejections under 35 U.S. C. § 112, first paragraph 

1. Claims 1, 11,20,34-36,40-42, 113 and 1 14 are rejected under 35 U.S.C. §112, 
first paragraph, as containing subject matter that was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant 
art that the inventor(s), at the time the application was filed, had possession of the 
claimed subject matter. 

2. Claims 1,11, 20, 34-36, 40-42, 1 13 and 1 14 are rejected under 35 U.S.C. §112, 
first paragraph, because the specification, while being enabling for a polypeptide 
consisting of amino acids 615-855 of SEQ ID NO:2, allegedly does not 
reasonably provide enablement for a polypeptide consisting of any protease 
domain of any type II membrane type serine protease (MTSP) or a catalytically 
active portion thereof. 

B. Rejection under 35 U.S.C. 102(b) 

Claims 1,11-13, 20, 34-36, 40-42, 1 13 and 1 14 are rejected under 35 U.S.C. 
§102(b) as being anticipated by Takeuchi et al., Proc. Natl. Acad. Sci. USA 96: 
1 1054-1 1061 (1999) ("Takeuchi"), a copy of which is attached in the Evidence 
Appendix as Exhibit 3. 

C. Rejection under 35 U.S.C. 102(e) 

Claims 1,11-13 and 34 are rejected under 35 U.S.C. § 102(e) as anticipated by 
O'Brien et a/., U.S. Patent No. 5,972,616 C'O'Brien"), a copy of which is attached 
in the Evidence Appendix as Exhibit 4. 

D. Rejection under 35 U.S.C. 103(a) 

Claims 1, 1 1-13 and 34-36, 40-42 and 113-1 14 are rejected under 35 U.S.C. 
103(a) as being unpatentable over O'Brien. 
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VIL ARGUMENTS 



1. REJECTION OF CLAIMS 1, 11, 20, 34-36, 40-42, 113 AND 114 UNDER 35 

U.S.C. §112, FIRST PARAGRAPH - POSSESSION 



Claims 1,11, 20, 34-36, 40-42, 1 13 and 1 14 are rejected under 35 U.S.C. §112, 
first paragraph, as allegedly containing subject matter that was not described in the 
specification in such a way as to reasonably convey to one skilled in the art that the 
inventor, at the time the application was filed, had possession of the claimed subject 
matter. The Examiner alleges that claims 1, 1 1, 20, 34-36, 40-42 and 1 13-1 14 are drawn 
to a polypeptide consisting of a protease domain or catalytically active fragment thereof of 
type-II membrane-type serine protease (MTSP) from any source and concludes that these 
claims are drawn to a genus of polypeptides having any structure. The Examiner alleges 
that the specification only teaches four species, and that four species are not a sufficient 
number of representative species of the genus to describe the whole genus. The Examiner 
also alleges that there is no evidence on the record of the relationship between the structure 
of the exemplary catalytically active protease domains and the structure of the serine 
protease domain of any or all MTSP polypeptides or MTSPl polypeptides. The Final 
Office Action concludes that the specification fails to sufficiently describe the claimed 
subject matter in such full, clear, concise, and exact terms that a skilled artisan would 
recognize that Appellant was in possession of the claimed subject matter. The rejection 
respectfully is traversed. 

A. LEGAL STANDARDS - 35 U.S.C. §112, FIRST PARAGRAPH - 
POSSESSION 

The purpose behind the written description requirement is to ensure that the patent 

Appellant had possession of the claimed subject mater at the time of filing of the application. 

The relevant law and a discussion of the Patent Office Guidelines are set forth in the previous 

responses of record in this application and below. Briefly, the Federal Circuit has discussed 

the application of the written description requirement of the first paragraph of 1 12 to claims in 

the field of biotechnology. See University of California v. Eli Lilly and Co., 119 F,3d 1559, 

43 U.S.P.Q.2d 1398, 1406 (Fed. Cir. 1997). The court explained that: 

In claims involving chemical materials, generic formulae usually 
indicate with specificity what the generic claims encompass. One skilled in 
the art can distinguish such a formula from others and can identify many of 
the species that the claims encompass. Accordingly, such a formula is 
normally an adequate description of the claimed genus ... a generic 
statement such as "vertebrate insulin or "mammalian insulin without more, is 
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not an adequate written description of the genus because it does not 
distinguish the claimed genus from others, except by function. It does not 
specifically define any of the genes that fall within its definition. It does not 
define any structural features commonly possessed by members of the genus 
that distinguish them from others. One skilled in the art therefore cannot, as 
one can do with a fiiUy described genus, visualize or recognize the identity of 
the members of the genus. A definition by function, as we have previously 
indicated, does not suffice to define the genus because it is only an indication 
of what the gene does, rather than what it is. 

The court also stated that "[ajwritten description of an invention involving a chemical 
genus, like a description of a chemical species, 'requires a precise definition, such as by 
structure, formula, [orjchemical name,' of the claimed subject matter sufficient to distinguish it 
from other materials." Id. at 1567, 43 U.S.P.Q.2d at 1405. Finally, the court addressed the 
manner by which a genus of might be described. "A description of a genus of cDNA may be 
achieved by means of a recitation of a representative number of cDNAs, defined by nucleotide 
sequence, falling within the scope of the genus or of a recitation of structural features common 
to the members of the genus, which features constitute a substantial portion of the genus." Id. 

The Federal Circuit also has addressed the written description requirement in the 

context of biotechnology-related subject matter in Enzo Biochem. Inc. v. Gen-Probe, 296 F.3d 

1316, 63 USPQ2d (BNA) 1609 (Fed. Cir. 2002). The Enzo court adopted the standard that: 

the written description requirement can be met by 'showing that an invention 
is complete by disclosure of sufficiently detailed, relevant identifying 
characteristics . . . complete or partial structure, other physical chemical 
properties, functional characteristics when coupled with a known or 
disclosed correlation between function and structure, or some combination of 
such characteristics.' 

The court in Enzo adopted its standard from the Written Description Examination Guidelines. 

The Guidelines apply to proteins as well as nucleic acid molecules. 

It is well-settled that the written description requirement of 35 U. S. C. §1 12, first 

paragraph, can be satisfied without express or explicit disclosure of a later-claimed invention. 

See, In re Herschler, 591 F.2d 693, 700-01, 200 USPQ 711, 717 (CCPA 1979): 

"The claimed subject matter need not be described in haec verba to satisfy the 
description requirement. It is not necessary that the application describe the 
claim limitations exactly, but only so clearly that one having ordinary skill in the 
pertinent art would recognize from the disclosure that appellants invented 
processes including those limitations." (citations omitted). 

See also Purdue Pharma L. P. v. Faulding, Inc., 230 F.3d 1320, 56 USPQ2d 1481 (Fed. Cir. 
2000). 



-14- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSSUet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

The written description requirement of 35 U.S.C § 1 12, first paragraph, can be 
satisfied by providing sufficient disclosure, either through illustrative examples or 
terminology. This clause does not require "a specific example of everything within the scope 
of a broad claim." In re Anderson, 176 USPQ 331, at 333 (CCPA 1973), emphasis in 
original. Further, because "it is manifestly impracticable for an applicant who discloses a 
generic invention to give an example of every species falling within it, or even to name every 
such species, it is sufficient if the disclosure teaches those skilled in the art what the invention 
is and how to practice it." In re Grimme, Keil and Schmitz, 124 USPQ 449, 502 (CCPA 
1960). 

B. THE REJECTION OF CLAIMS 1-3. 5, 9, 1 L 19, 20. 34-36, 40-42. 113 AND 1 14 



In setting forth the rejection, the Examiner states that the claims are drawn to 
polypeptides having any structure and are thus drawn to a genus encompassing species having 
substantial variation. The Examiner states that only four species are described in the 
specification and that there is no evidence on the record of the relationship between the 
stmcture of the exemplary catalytically active protease domains and the structure of the serine 
protease domain of any or all MTSP polypeptides. Appellant respectfully submits that this is 
not correct. 

1. Standard for satisfying the written description requirement for possession 

In order to satisfy the written description requirement, one need not provide an 
example of every species encompassed by a claim. It is sufficient to provide identifying 
characteristics, including structural and physical characteristics, functional characteristics 
coupled with known or disclosed correlation with structural characteristics to demonstrate that 
the applicant was in possession of the claimed subject matter. MPEP § 2163; see University 
of California v. Eli Lilly, 119 F. 3d 1559, 1568, 43 USPQ2d 1398, 1406 (Fed. Cir. 1997). 
Further, the standard is an objective one, based on what one of skill in the art would recognize 
in the disclosure. In re Gosteli, 872 F.2d at 1012. As is discussed in more detail below, it 
respectfully is submitted that the instant application sufficiently describes the claimed genus 
of isolated MTSP protease domains to demonstrate possession of the claimed subject matter at 
the time of the effective filing date of each claim as required by this standard. 



UNDER 35 U.S.C. $112, FIRST PARAGRAPH SHOULD BE REVERSED 
BECAUSE THE SPECIFICATION MEETS THE WRITTEN DESCRIPTION 
REOUIREMENT WITH RESPECT TO POSSESSION 



Claim 1 
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2. Specification describes more than four species of MTSP protease domains 

In this instance, the specification identifies all known members of the family and 
identifies several new members, including protease domains (as well as full-length) MTSP3, 
MTSP6 two splice variants of MTSP4. Thus, contrary to the Examiner's assertion that the 
specification provides only four species of protease domains, Appellant respectfully submits 
that the apphcation identifies all of the 17 laiown members of the MTSP family (see, e,g., 
page 4) known at the time of fihng, and provides the sequences of full-length MTSP proteases 
and identifies the protease domains thereof. In addition, the specification teaches how to 
identify a protease domain in an MTSP, how to identify a fi-ee Cys residue and to replace a 
Cys residue. The members of the MTSP family provided include, MTSPl (also referred to 
as matriptase and TAGD-15), MTSP3, MTSP4 (two variants encoded by splice variants), 
MTSP6, corin, enteropeptidase, human airway trypsin-like protease (HAT), hepsin, TMPRS2 
and TMPRSS4. For example, page 4, line 20 through page 5, line 17 of the specification 



In marrunals, at least 17 members of the family are known, including seven in humans 
(see, Hooper et al, (2001) J. Biol. Chem. 276:857-860). These include: corin (accession 
nos. AF133845 and AB013874; see, Yan et al, (1999) J. Biol. Chem. 274:14926-14938; 
Tomita et al, (1998) J. Biochem. 124:784-789; Uan et al. (2000) Proc. Natl. Acad. Sci. 
U.SA. 97:8525-8529); enteropeptidase (also designated enterokinase; accession no. 
U09860 for the human protein; see, Kitamoto et al (1995) Biochem. 27: 4562-4568; 
Yahagi et al (1996) Biochem. Biophys. Res. Commun. 219:806-812; Kitamoto et al, 
(1994) Proc. Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et aL (1994) J. Biol. 
Chem. 269:19976-19982;); human airway trypsin-like protease (HAT; accession no. 
AB002134; see Yamaoka et aL J. Biol. Chem. 273:11894-1 1901); MTSPl and matriptase 
(also called TADG-15; see SEQ ID Nos. 1 and 2; accession nos. AFl 33086/ AFl 18224, 
AF04280022; Takeuchi et al, (1999) Proc. Natl. Acad. Sci. U.S.A. 96:1 1054-1 161; Lin et 
aL (1999) J. Biol. Chem. 274:18231-18236; Takeuchi et aL (2000) J. Biol. Chem. 
275:26333-26342; and Kim et al. (1999) Immunogenetics 49:420-429); hepsin (see, 
accession nos. Ml 8930, AF030065, X70900; Leytus et aL (1988) Biochem. 27: 1 1895- 
11901; Vu etaL (1997) J. Biol. Chem. 272:31315-31320; and Farley et al. (1993) 
Biochem. Biophys. Acta 1 173:350-352; and see, U.S. Pat. No. 5,972,616); TMPRS2 (see, 
Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et aL (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100); and TMPRSS4 (see. 
Accession No. NM 016425; Walhapp et al, (2000) Cancer 60:2602-2606). 

Thus, the specification provides 17 examples of MTSPs and isolated protease domains {e,g.^ 
see also pages 9-10), including MTSPl, MTSP3, MTSP4 (2 splice variants) and MTSP6, 
incorporates publications describing all known family members and the protease domains 
thereof, and describes full-length sequences. 



recites: 



-16- 



Applicant : Madison et aL 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DoBcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

3. MTSPs are a known family of serine proteases with known structural features 

As noted, the MTSPs are a known and well studied family of enzymes, the 
specification teaches how to identify members of the MTSP family and the specification 
provides relevant structural and functional features that uniquely identify and specify the 
claimed genus of polypeptides. The MTSP protease family of enzymes has been extensively 
studied and characterized, evidenced by the art made of record in Information Disclosure 
Statements and provided in previous responses and herein. Hooper et al. teaches that many of 
the serine proteases are mosaic proteins that include multiple, structurally distinct domains 
necessary for regulating enzymatic activity (Eur. J. Biochem. 267: 6931-6937 (2000), Exhibit 
14). Lin et al. ((1999) J. Biol. Chem. 274:18231-36, Exhibit 20) and Yan et aL ((1999) J. 
Biol. Chem. 274:14926-35), Exhibit 44) teach that MTSPs are a family of proteins that can be 
distinguished from many other types of proteins and enzymes because they have highly 
conserved structures. For example, as discussed in the instant specification, it is known in the 
art that a substrate specificity pocket in the protease domain and conserved cysteines that 
participate in disulfide bonding are highly conserved features in serine proteases (see, e.g.. 
Figure 4 and page 18235 of Lin et al (Exhibit 20) and Figure 2 and page 18236 of Yan et al., 
Exhibit 44). 

MTSPs are a class of serine proteases characterized by having an NHi-terminal 
cytoplasmic tail and a COOH-terminal ectodomain, lacking an NH2-terminal cleavable signal 
sequence, and having a signal/anchor domain that anchors the serine protease in the cell 
membrane (e.g., see Parks et al, J. Biol. Chem. 268: 19101-19109 (1993), Exhibit 26 and 
Parks & Lamb, Cell 64: 111-1%! (1991), Exhibit 27). Tsuji et al, teaches that MTSPs, such 
as hepsin, include a hydrophobic sequence flanked by a sequence having a positive net 
charges on the NHi-terminal side while the COOH-terminal flanking side contains no charge, 
which agrees with the consensus topological sequence for the MTSPs (Tsuji et aL, J Biol 
Chem 266(25): 16948-16953 (1991), Exhibit 37). The MTSPs have the triad of residues 
His57, Asp 102 and Serl95 at the active site (chymotrypsin numbering system), which are in 
close proximity and serve as a functional interacting unit responsible for bond formation and 
cleavage during catalysis (Craik et aL, Science 237:909-913 (1987), Exhibit 10). Thus, an 
MTSP polypeptide can be characterized as a serine protease that includes the conserved 
catalytic triad, lacks a cleavable signal sequence, includes a transmembrane anchoring 
domain, and has positively charged residues on the N-terminal side of a long stretch of 
hydrophobic amino acids and has a characteristic disulfide bond pattern (Walter et aL, Annu. 
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Rev. Cell Biol. 2: 499-516 (1986), Exhibit 40). The lack of a signal sequence, a 
characteristic disulfide bond pattern, a characteristic hydrophobic region and the presence of 
a signal/anchor domain also are seen in all of the MTSPs, including hepsin (Leytus et al.^ 
Biochemistry 27: 1067-1074 (1988), Exhibit 19), enteropeptidase (Kitamoto et aL, Proc. 
Natl. Acad. Sci. USA 91: 7588-7592 (1994), Exhibit 17), TMPRSS2 (Paoloni-Giacobino et 
al. Genomics 44: 309-320 (1997), Exhibit 31), and human airway trypsin-like protease 
(Yamaoka a/., J. Biol. Chem. 273: 11895-11901 (1998), Exhibit 43). 

The specification also describes structural features and structure-fiinction 
relationships that identify the MTSP family of polypeptides. Such description includes 
information regarding the tertiary structure of the polypeptide. For example, the specification 
teaches the locus of the disulfide bonds, identifies the Cys residues that link the protease 
domain to the rest of the polypeptide, and teaches that the polypeptide includes at least one of 
the active site triad, primary specificity pocket and oxyanion hole. The specification states 
that the MTSP family of proteins shares a high degree of homology. Hence, other MTSPs, 
such as MTSPs firom other species, can be readily identified by its homology with known 
MTSPs. The specification also teaches that the protease domain of a MTSP shares homology 
and structural features with the chymotrypsin/trypsin family protease domains. The previous 
responses of record and the application establish that the application describes the MTSP 
family and describes identification and isolation of protease domains. 

Most significantly, the application identifies the known members of the MTSP family, 
provides sequences thereof and/or references earlier publications describing the family 
members, and provides working examples for MTSPl, MTSPS, MTSP6 and the two MTSP4 
splice variants. 

4. The specification provides relevant identifying characteristics of the protease 



As discussed in responses of record, methods of identifying and isolating serine protease 
domains of MTSPs were known in the art at the time of filing the application and are taught in 
the specification. The specification describes protease domains of MTSPs and provides 
sequences of exemplars thereof. For example, the specification teaches, e,g.^ at page 19, lines 3- 
24, that: 

Exemplary MTSP proteins, with the protease domains indicated, are illustrated in 
Figures 1-3. Smaller portions thereof that retain protease activity are contemplated. 
The protease domains vary in size and constitution, including insertions and deletions 
in surface loops. They retain conserved structure, including at least one of the active 



domain 



-18- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSSIcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

site triad, primary specificity pocket, oxyanion hole and/or other features of serine 
protease domains of proteases. Thus, for purposes herein, the protease domain is a 
portion of a MTSP, as defined herein, and is homologous to a domain of other MTSPs, 
such as conn, enteropeptidase, human airway trypsin-like protease (HAT), MTSPl, 
TMPRSS2, and TMPRSS4, which have been previously identified; it was not 
recognized, however, that an isolated single chain form of the protease domain could 
function proteolytically in in vitro assays. As with the larger class of enzymes of the 
chymotrypsin (SI) fold (see, e.g., Internet accessible MEROPS data base), the MTSPs 
protease domains share a high degree of amino acid sequence identity. The His, Asp 
and Ser residues necessary for activity are present in conserved motifs. The activation 
site, which results in the N-terminus of second chain in the two chain forms is has a 
conserved motif and readily can be identified (see, e.g., amino acids 801-806, SEQ ID 
No. 62, amino acids 406-410, SEQ ID No. 64; amino acids 186-190, SEQ ID No. 66; 
amino acids 161-166, SEQ ED No. 68; amino acids 255-259, SEQ ID No. 70; amino 
acids 190-194, SEQ ID No. 72). 

The specification also describes how to identify a protease domain of the MTSPs (see, e.g.. 



The protease domains as provided herein are single-chain 
polypeptides, with an N-terminus (such as IV, W, IL and II) generated at 
the cleavage site (generally have the consensus sequence R iWGG, 
R ilVGG, R WLGG, R iVGLL, R WLGG or a variation thereof; an N- 
terminus of R iV or R W, where the arrow represents the cleavage point) 
when the zymogen is activated. To identify a protein domain an RI 
should be identified, and then following amino acids compared to the 
above noted motif[s]. [emphasis added] 

The instant specification teaches that the protease domain includes as a common 

structural feature a conserved catalytic triad. The art of record evidences that this is a 

characteristic feature. For example, Lin et aL teaches that membrane-type serine proteases 

include an invariant catalytic triad, a characteristic disulfide pattern and a proteolytic 

activation site in an Arg-Val-Val-Gly-Gly motif similar to the characteristic RIVGG motif in 

other serine proteases. (Lin et aL, J Biol Chem 274(26): 18231-18236 (1999), Exhibit 21). 

Kitamoto et al. teaches that the catalytic domain of MTSPs has a characteristic disulfide bond 

pattern (Kitamoto et aL, Proc Natl Acad Sci USA 91: 7588-7592 (1994), Exhibit 17). The 

specification teaches how to identify members of the MTSP family. For example, page 49, 

lines 3-10 or the specification recites: 

The MTSPs are a family of transmembrane serine proteases that are found in 
manmials and also other species that share a number of common structural 
features including: a proteolytic extracellular C-terminal domain; a 
transmembrane domain, with a hydrophobic domain near the N-terminus; a short 
cytoplasmic domain; and a variable length stem region containing modular 
domains. The proteolytic domains share sequence homology including conserved 
his, asp, and ser residues necessary for catalytic activity that are present in 
conserved motifs. 



page 8): 
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Accordingly, the specification and the prior art sets forth specific structural and physical 
features that define MTSPs and their protease domains. 

5. The specification provides relevant identifying characteristics of the genus 
In addition to describing known and newly provided protease domains, the specification 
provides relevant identifying characteristics of the "genus "of serine protease domains as 
instantly claimed, including conserved structural and functional characteristics of an MTSP 
protease domain, provides a number of exemplary protease domains, and also directs those 
skilled in the art to exemplary art that describes conmion structural and functional features 
shared by the protease domain of MTSPs. For example, see page 26, lines 13-25, which 



Hence smaller portions of the protease domains, particularly the single chain domains, 
thereof that retain protease activity £ire contemplated. Such smaller versions will 
generally be C-terminal truncated versions of the protease domains. The protease 
domains vary in size and constitution, including insertions and deletions in surface 
loops. Such domains exhibit conserved stmcture, including at least one structural 
feature, such as the active site triad, primary specificity pocket, oxyanion hole and/or 
other features of serine protease domains of proteases. Thus, for purposes herein, the 
protease domain is a single chain portion of an MTSP, as defined herein, but is 
homologous in its structural features and retention of sequence of similarity or 
homology the protease domain of chymotrypsin or trypsin. Most significantly, the 
polypeptide will exhibit proteolytic activity as a single chain. 

The specification teaches that included among the conserved features of MTSP protease 
domain polypeptides is a catalytic triad and an activation cleavage site, which defines the 
terminus of the protease domain polypeptides when they are isolated as single chain 
polypeptides. 

The specification explains that beyond such conserved features, the polypeptides are 
tolerant of modification. The specification explains that such modifications can be effected 
using numerous methods known in the art. For example, at page 77, line 17 through page 78, 

line 11, the specification states: 

A variety of modifications of the MTSP proteins and domains are contemplated 
herein. An MTSP-encoding nucleic acid molecule can be modified by any of numerous 
strategies known in the art (Sambrook et al,, 1990, Molecular Cloning, A Laboratory 
Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, New York). The 
sequences can be cleaved at appropriate sites with restriction endonuclease(s), followed 
by further enzymatic modification if desired, isolated, and ligated in vitro, hi the 
production of the gene encoding a domain, derivative or analog of MTSP, care should be 
taken to ensure that the modified gene retains the original translational reading frame, 
uninterrupted by translational stop signals, in the gene region where the desired activity is 
encoded. 



recites: 
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Additionally, the MTSP-encoding nucleic acid molecules can be mutated in vitro or 
in vivo, to create and/or destroy translation, initiation, and/or termination sequences, or to 
create variations in coding regions and/or form new restriction endonuclease sites or 
destroy pre-existing ones, to facilitate further in vitro modification. Also, as described 
herein muteins with primary sequence alterations, such as replacements of Cys residues 
and elimination of glycosylation sites are contemplated. Such mutations may be effected 
by any technique for mutagenesis known in the art, including, but not limited to, 
chemical mutagenesis and in vitro site-directed mutagenesis (Hutchinson et al., J. Biol. 
Chem. 253:6551-6558 (1978)), use of TAB® linkers (Pharmacia). In one embodiment, 
for example, an MTSP protein or domain thereof is modified to include a fluorescent 
label. In other specific embodiments, the MTSP protein is modified to have a 
heterofunctional reagent, such heterofiinctional reagents can be used to crosslink the 
members of the complex. 

The specification incorporates by reference and directs those skilled in the art to 
exemplary art that describes common structural and fiinctional features shared by the protease 
domain of MTSPs. For example, Lin et aL (J. Biol. Chem. 274:18231-36 (1999), Exhibit 20) 
and Yan et al. (J. Biol. Chem. 274:14926-35 (1999), Exhibit 44) teach that MTSPs have 
highly conserved structures, including a cleavage site at the N-terminus of the protease 
domain, a substrate specificity pocket in the protease domain and highly conserved cysteines 
that participate in disulfide bonding (see, e.g., Figure 4 and page 18235 of Lin et aL (Exhibit 
20) and Figure 2 and page 18236 of Yan et al. (Exhibit 44)). Other conserved elements 
include a conserved activation motif ((R/K)VIGG), residues Asp627, Gly-655 and Gly-665 in 
the substrate pocket, v^itYv Asp at the bottom of the substrate pocket, and eight conserved 
cysteines that form intramolecular disulfide bonds (Lin et aL J Biol Chem 274(26): 18231- 
18236 (1999), Exhibit 20). In addition, a correlation between retention of the catalytic triad 
and retention of serine protease activity v^as demonstrated and know^n in the art at the time of 
filing. For example, Craik et aL (Science 237: 909-913 (1987), Exhibit 10), Sprang et aL 
(Science 237: 905-909 (1987), Exhibit 35), Carter e^al. (Nature 332: 564-568 (1988), Exhibit 
8) and Bachovchin et aL (Proc. Natl Acad. Sci. 78: 7323-7326 (1981), Exhibit 5) teach that 
serine protease activity is retained in an MTSP by retaining the conserved structure of the 
catalytic triad. 

The specification provides methods for identification, production, isolation, synthesis 
and/or purification of MTSP protease domains (see e.g., working examples 1-4, which 
describes cloning and expression of the protease domains with the Cys replaced; Example 5 
demonstrates assays for identifying inhibitors of the catalytic activity of each). The 
specification states, for example, that MTSP3, MTSP4 and MTSP6 are isolated fi-om any 
animal, particularly a mammal, and includes but are not limited to, humans, rodents, fowl. 
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ruminants and other animals (see page 20, lines 21-23; page 21, lines 11-13; and page 21, 
lines 29-3 1 , respectively). Alternative methods for obtaining the MTSP protein than by 
directly isolating the MTSP protein also are provided. These include synthesis using 
genomic DNA, chemically synthesizing the gene sequence from a known sequence and 
making cDNA to the mRNA that encodes the MTSP protein, for example, and inserting the 
isolated nucleic acids into an appropriate cloning vector (for example, see pages 67-79). 
Methods of identifying and isolating serine protease domains from MTSPs, such as MTSPl 
and matriptase (also referred to as TAGD-15), corin, enteropeptidase, human airway trypsin- 
like protease (HAT), hepsin, TMPRS2 and TMPRSS4, were known in the art at the time of 
filing the application and are taught in the specification {e.g,^ see page 4, line 20 through page 
5, line 17). 

In addition, the specification provides exemplary assays in which catalytic activity of 
the polypeptides can be tested {e.g., see Examples 3 and 4). Thus, the specification describes 
the sequences and provides references, which are incorporated by reference, describing all of 
the known members of the MTSP family and the protease domains thereof, teaches how to 
identify an MTSP, teaches how to identify the protease domain of an MTSP if it is not known 
and teaches how to test the polypeptide for proteolytic activity. 

The art of record and discussed previously and herein evidences that, with the 
information provided in the specification, the skilled artisan can recognize the protease 
domain of an MTSP by its requisite protease domain structure and conserved features. If 
necessary, one of skill in the art could test the polypeptides for catalytic activity using the 
assays provided in the specification or known to those of skill in art to order to identify those 
polypeptides that possess the requisite catalytic activity. 

6. Specification describes modification of MTSP protease domains 

As discussed above, a correlation between retention of the catalytic triad and retention 
of serine protease activity was demonstrated and known in the art at the time of filing {e,g., see 
Craik et al. (Science 237: 909-913 (1987), Exhibit 10). The specification teaches additional 
modifications of the MTSP polypeptides such that protease activity is retained. For example, 
the specification explains that for each individual MTSP, the polypeptides can include about 
60% amino acid sequence identity with the exemplified MTSP. Such modified polypeptides 
exhibit serine protease activity as single chain polypeptides. The specification provides 
exemplary modifications including conservative amino acid substitution (for example, see page 
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10, lines 3-13) and modifications of cysteine residues and/or of glycosylation sites (for 

example, see page 78, lines 1-7). The specification also discloses that non-natural amino acids 

can be introduced as a substitution or addition in the MTSP polypeptides (for example, see 

page 79, lines 10-21). The specification also directs those skilled in the art to exemplary art 

that describes common structural features shared by the transmembrane serine proteases (for 

example, seepage 18, lines 1-15). 

The specification exemplifies the replacement of a free Cys in the protease domain 

with another amino acid. For example, the specification states on page 10, lines 3-13 that: 

Also provided are muteins of the single chain protease domains and MTSPs, 
particularly muteins in which the Cys residue in the protease domain that is free 
(i.e., does not form disulfide linkages with any other Cys residue in the protein) is 
substituted with another amino acid substitution, preferably with a conservative 
amino acid substitution or a substitution that does not eliminate the activity, and 
muteins in which a glycosylation site(s) is eliminated. Muteins in which other 
conservative amino acid substitutions in which catalytic activity is retained are 
also contemplated (see, e.g., Table 1, for exemplary amino acid substitutions). 
See, also, FIG. 4, which identifies the free Cys residues in MTSPS, MTSP4 and 
MTSP6. 

The specification specifically describes the replacement of a free Cys in the protease 

domain with another amnio acid. For example, Example 1, on page 161, lines 4-9, 

exemplifies replacing the free Cys in the protease domain with another amino acid: 

To eliminate the free cysteine (at position 310 in SEQ ID No. 4) that exists 
when the protease domain of the MTSP3 protein is expressed or the zymogen is 
activated, the free cysteine at position 310 (see SEQ ID No. 3), which is Cys 122 
if a chymotrypsin numbering scheme is used, was replaced with a serine. 

As discussed below in more detail, working examples for expression of the protease domains 

of MTSPS, MTSPl and both MTSP4 are provided. 

Conclusion 

The claims are directed to isolated single chain protease domains of a known family 
of proteins, the MTSP family. The instant application provides the sequences of 17 of the 
known MTSP family members (directly or by incorporation by reference of references 
providing the sequences). The instant specification provides new members of the MTSP 
family and provides working examples providing the isolated protease domains thereof, 
where the free Cys is replaced with another amino acid. Appellant has discovered that the 
isolated single chain form of the protease domain of these polypeptides is active and, its, use, 
for example, for preparing antibodies specific thereto and in diagnostic assays. Hence, the 
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recitation in the claims that the polypeptides consist of a protease domain from an MTSP, are 
single-chain polypeptides having serine protease activity and have a free Cys in the protease 
domain replaced with another amino acid indicates with specificity what the generic claims 
encompass. One skilled in the art can distinguish such a polypeptide from others and can 
identify species that the claims encompass. Having taught the skilled artisan that the single 
chain protease domain of an MTSP is active, how to identify an MTSP and its protease 
domain, and how to test for activity, the skilled artisan is in possession of the entire genus of 
single chain protease domains. 

An adequate written description for a claimed genus only has to provide "relevant, 
identifying characteristics" of a representative number of species (MPEP §2163). It 
respectfully submitted that the instant specification meets this test. As noted, the specification 
describes all 17 known species of MTSPs and isolated protease domains (e.^., see pages 9-10), 
as well as previously unknown species (MTSPS, MTSP4 (2 splice variants) and MTSP6), 
incorporates publications describing all known family members and their full length sequences, 
and provides relevant structural and functional features that uniquely identify and specify the 
claimed genus of polypeptides. The specification teaches that those of skill in the art recognize 
common elements among MTSPs and the protease domains of MTSPs, and teaches a number 
of conserved characteristics for the MTSPs and protease domains thereof, and that the 
sequences and locus of the protease domains are known or can be determined as taught in the 
application. The specification teaches that members of the MTSP family are and were known, 
provides additional members, teaches how to identify and isolate protease domains as single 
chains and how to assess activity. One of skill in the art could, if needed, readily test any of 
those polypeptides for catalytic activity. 

Therefore, in light of Appellant's disclosure, one of skill in the art would have 
recognized from reading the application that Appellant provided single-chain polypeptides with 
the recited protease domain structure that possess serine protease activity. The combination of 
the disclosure of the specific chemical structures of all 17 species of MTSPs known at the time 
of filing and the provision and description of new species within the scope of the claims as well 
as teachings in the specification (and knowledge of those of skill in the art) of how to identify 
serine protease domains, such as based on homology as known in the art and described in the 
specification, and how to isolate a protease domain and also assays for testing for activity and 
the evidence that those of skill in the art are very familiar with the MTSP structure and 
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function renders it clear that one of skill in the art would recognize that Appellant had 
possession of the claimed polypeptides at the time of the priority date of each claim. One of 
skill in the art would have recognized from reading the disclosure that Appellant had 
possession of this genus as well as numerous species thereof. This teaching and knowledge 
coupled with the ability to test for species within the scope of the claims with the assays 
provided for in the specification and known in the art demonstrates that Appellant sufficiently 
described and was in possession of the polypeptides as claimed, at the effective filing date(s) of 
the claims. 

For the reasons above, each of the dependent claims meets the written description 
requirement and, in addition, additional reasons for each dependent claim are described 
below. 

Dependent Claim 11 

Claim 1 1 depends from claim 1 and includes every limitation thereof. Claim 1 1 recites 
that the MTSP is selected from among MTSPl, MTSP3, MTSP4 and MTSP6. The 
specification describes MTSPl, e.g., at pages 54-58. The specification describes MTSP3, e.g., 
at pages 58-60 and Example 1 (pages 160-167). The specificafion describes MTSP4, e,g., at 
pages 60-63 and Example 2 (pages 167-171. The specification describes MTSP6, e.g., at pages 
63-64 and Example 3 (pages 171-176). The working examples provide isolated protease 
domains with the free Cys residue replaced with another amino acid. Working Example 1 
describes preparation and cloning and expression of the protease domain of MTSP3, Example 
2 and 4, describe cloning and expression of the protease domains of MTPSs 3 and 4, and 
Example 3 describes cloning of MTSP6. Example 4 describes expression of the MTSP4 (both 
variants), MTSP3 and MTSP6 protease domains, with the replaced Cys. Example 6 describes 
cloning and isolated of the protease domain of MTSPl . Example 7 describes production of the 
protease domain of MTSPl £ind purification of the protease domain. 

Appellant respectfially submits that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes each of the isolated 
protease domains of MTSPl, MTSP3, MTSP4 (two splice variants) and MTSP6, where the 
free Cys is replaced with another amino acid, one of skill in the art would recognize that 
Appellant was in possession of the subject matter of claim 1 1 at its effective filing date. 
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Dependent Claim 20 

Claim 20 depends from claim 1 and includes every limitation thereof. Claim 20 
recites that a free Cys in the protease domain is replaced with a serine. For the reasons 
articulated above with respect to claim 1, Appellant respectfully submits that one of skill in 
the art would recognize that Appellant was in possession of a substantially purified single- 
chain polypeptide consisting only of a protease domain of a type-II membrane-type serine 
protease (MTSP) or a catalytically active fragment thereof as a single chain, where the MTSP 
protease domain or catalytically active fragment thereof has serine protease activity as a 
single chain and a free Cys in the protease domain is replaced with another amino acid. 

The specification exemplifies-the replacement of a free Cys in the protease domain 

with serine. For example, the specification states on page 10, lines 3-13 that: 

Also provided are muteins of the single chain protease domains and MTSPs, 
particularly muteins in which the Cys residue in the protease domain that is free 
(i.e., does not form disulfide linkages with any other Cys residue in the protein) is 
substituted with another amino acid substitution, preferably with a conservative 
amino acid substitution or a substitution that does not eliminate the activity, and 
muteins in which a glycosylation site(s) is eliminated. Muteins in which other 
conservative amino acid substitutions in which catalytic activity is retained are 
also contemplated (see, e.g., Table 1, for exemplary amino acid substitutions). 
See, also, FIG. 4, which identifies the free Cys residues in MTSPS, MTSP4 and 
MTSP6. 

Table 1 of the specification identifies serine as a substitution for Cys (see page 34, line 6). 
The specification specifically describes the replacement of a free Cys of the protease domain 
with a serine in Example 1, which recites, on page 161, lines 4-9: 

To eliminate the free cysteine (at position 310 in SEQ ID No. 4) that exists 
when the protease domain of the MTSP3 protein is expressed or the zymogen is 
activated, the free cysteine at position 310 (see SEQ ID No. 3), which is Cys 122 
if a chymotrypsin numbering scheme is used, was replaced with a serine. 

Appellant respectfully submits that one of skill in the art would recognize that 
Appellant was in possession of a substantially purified single-chain polypeptide consisting 
only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, where the MTSP protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with a serine. 
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Dependent Claim 34 

Claim 34 depends from claim 1 and includes every limitation thereof. Claim 34 
recites that the MTSP is selected from among corin, MTSPl, enteropeptidase, human airway 
trypsin-like protease (HAT), TMPRSS2, and TMPRSS4. For the reasons articulated above 
with respect to claim 1 , Appellant respectfiilly submits that one of skill in the art would 
recognize that Appellant was in possession of a substantially purified single-chain 
polypeptide consisting only of a protease domain of a type-II membrane-type serine protease 
(MTSP) or a catalytically active fragment thereof as a single chain, where the MTSP protease 
domain or catalytically active fragment thereof has serine protease activity as a single chain 
and a free Cys in the protease domain is replaced with another amino acid. 

The specification specifically recites that the protease domains can be from any 
MTSP family member, including corin, MTSPl, enteropeptidase, human airway trypsin-like 
protease (HAT), TMPRSS2, and TMPRSS4. For example, see page 8, line 30 through page 

10, line 2, which recites: 

The protease domains provided herein include, but are not limited to, the single chain 
region having an N-terminus at the cleavage site for activation of the zymogen, through 
the C-terminus, or C-terminal truncated portions thereof that exhibit proteolytic activity 
as a single-chain polypeptide in in vitro proteolysis assays, of any MTSP family member, 
preferably from a mammal, including and most preferably human, that, for example, is 
expressed in tumor cells at different levels from non-tumor cells, and that is not 
expressed on an endothelial cell. These include, but are not limited to: MTSPl (or 
matriptase), MTSP3, MTSP4 and MTSP6. Other MTSP protease domains of interest 
herein, particularly for use in in vitro drug screening proteolytic assays, include, but are 
not limited to: corin (accession nos. AF133845 and AB013874; see, Yan et al. (1999) J. 
Biol. Chem. 274: 14926-14938; Tomita et al. (1998) J. Biochem. 124:784-789; Uan et al. 
(2000) Proc. Natl. Acad. Sci. U.S.A. 97:8525-8529; SEQ ID Nos. 61 and 62 for the 
human protein); enteropeptidase (also designated enterokinase; accession no. U09860 for 
the human protein; see, Kitamoto et al. (1995) Biochem. 27: 4562-4568; Yahagi et al. 
(1996) Biochem. Biophys. Res. Commun. 219:806-812; Kitamoto et al. (1994) Proc. 
Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et al. (1994) J. Biol. Chem. 
269:19976-19982; see SEQ ID Nos. 63 and 64 for the human protein); human airway 
trypsin-like protease (HAT; accession no. AB002134; see Yamaoka et al. J. Biol. Chem. 
273:1 1894-11 901; SEQ ID Nos. 65 and 66 for the human protein); hepsin (see, accession 
nos. Ml 8930, AF030065, X70900; Yamaoka etal. (1988) J Biol Chem 27: 11895-11901; 
Vu et al. (1997) J. Biol. Chem. 272:31315-31320; and Farley et al. (1993) Biochem. 
Biophys. Acta 1 173:350-352; SEQ ID Nos. 67 and 68 for the human protein); TMPRSS2 
(see. Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et al. (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100; SEQ ID Nos. 69 and 70 
for the human protein) TMPRSS4 (see, Accession No. NM 016425; Wallrapp et al. 
(2000) Cancer 60:2602-2606; SEQ ID Nos. 71 and 72 for the human protein); and 
TADG-12 (also designated MTSP6, see SEQ ID Nos. 1 1 and 12; see International PCT 
application No. WO 00/52044, which claims priority to U.S. application Ser. No. 
09/261,416). 
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Hence, the application specifically describes the protease domain of MTSP family members 
corin, enteropeptidase, HAT, TMPRSS4 and TMPRSS2 and others. Appellant respectfully 
submits that, in view of the arguments set forth above with respect to claim 1 and the 
teaching in the specification, which describes the protease domain of each of corin, 
enteropeptidase, HAT, TMPRSS4 and TMPRSS2, one of skill in the art would recognize that 
Appellant was in possession of the subject matter of claim 34 at its effective filing date. 
Dependent Claim 35 

Claim 35 recites a conjugate that includes a) a polypeptide of claim 1 , and 
b) a targeting agent linked to the protein directly or via a linker, wherein the conjugate has 
serine protease activity. For the reasons articulated above with respect to claim 1 , Appellant 
respectfiiUy submits that one of skill in the art would recognize that Appellant was in 
possession of a substantially purified single-chain polypeptide consisting only of a protease 
domain of a type-II membrane-type serine protease (MTSP) or a catalytically active fi-agment 
thereof as a single chain, where the MTSP protease domain or catalytically active fi-agment 
thereof has serine protease activity as a single chain and a firee Cys in the protease domain is 
replaced with another amino acid. 

The specification specifically discloses conjugates of single-chain protease domains 
conjugated to a targeting agent, e.g., at page 14, lines 19-26. The specification teaches that 
the conjugates can be prepared by chemical conjugation, recombinant DNA technology or 
combinations thereof, and provides detailed descriptions of chemical conjugation, including 
acid cleavable, photo-cleavable and heat sensitive linker technology and other linkers, fiision 
proteins, peptide linkers, conjugation to targeting agents, and adsorption, absorption and/or 
covalent bonding to a solid support (see e.g., pages 123-131). 

Appellant respectfijlly submits that that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes conjugates of single- 
chain protease domains conjugated to a targeting agent, several different types of conjugation 
technologies for making the conjugates and exemplary conjugates, one of skill in the art 
would recognize that Appellant was in possession of the subject matter of claim 35 at its 
effective filing date. 

Dependent Claim 36 

Claim 36 depends fi*om claim 35 and recites a conjugate that includes a targeting 
agent that permits i) affinity isolation or purification of the conjugate; ii) attachment of the 
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conjugate to a surface; iii) detection of the conjugate; or iv) targeted delivery to a selected 
tissue or cell. For the reasons articulated above with respect to claims 1 and 35, Appellant 
respectfully submits that one of skill in the art would recognize that Appellant was in 
possession of a conjugate that includes a substantially purified single-chain polypeptides 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, where the MTSP protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with another amino acid and a targeting agent. 

The specification recites, e.^., at page 14, lines 19-26 and page 123, line 30 through 
page 124, line 7, that the targeting agent of the conjugate permits affinity isolation or 
purification of the conjugate; attachment of the conjugate to a surface; detection of the 
conjugate; or targeted delivery to a selected tissue or cell. The specification teaches 
exemplary targeting agents, including tissue specific or tumor specific monoclonal 
antibodies, a growth factor or fragment thereof, such as FGF, EGF, PDGF, VEGF, cytokines, 
including chemokines, and other such agents, a protein or peptide fragment that contains a 
protein binding sequence, a nucleic acid binding sequence, a lipid binding sequence, a 
polysaccharide binding sequence, or a metal binding sequence, or a linker for attachment to a 
solid support (see, e,g,, page 124, lines 8-17) as well as linkers that allow for attachment of 
the conjugate to a surface (see, e,g,,, pages 131-136). The specification also describes the 
construction of affinity binding pairs for isolation and/or purification of the conjugate (e.g^., 
see page 131, lines 5-37). 

Appellant respectfiilly submits that that, in view of the arguments set forth above with 
respect to claims 1 and 35 and the teaching in the specification, which describes several 
different types of targeting agents and methods of conjugating such targeting agents to isolated 
protease domains, one of skill in the art would recognize that Appellant was in possession of 
the subject matter of claim 36 at its effective filing date. 
Dependent Claim 40 

Claim 40 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. For the reasons articulated above with respect to 
claim 1 , Appellant respectfully submits that one of skill in the art would recognize that 
Appellant was in possession of a substantially purified single-chain polypeptide consisting 
only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
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catalytically active fragment thereof as a single chain, where the MTSP protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with another amino acid. 

The specification describes solid supports and methods for immobilizing MTSP 
protein to solid supports {e.g.^ see pages 131-136). The specification teaches exemplary solid 
supports, including supports having any required structure and geometry, such as beads, 
pellets, disks, capillaries, hollow fibers, needles, solid fibers, random shapes, thin films and 
membranes {e.g,, page 132, lines 26-29). The specification teaches that a plurality of MTSP 
protease domains, including two or more protease domains, can be attached to a solid support 
{e.g., page 132, lines 4-8). 

Appellant respectfially submits that that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes several different 
types of solid supports and methods of conjugating isolated protease domains to solid 
supports, one of skill in the art would recognize that Appellant was in possession of the 
subject matter of claim 40 at its effective filing date. 

Dependent Claim 41 

Claim 41 depends from claim 40 and recites that the polypeptides comprise an array. 
The specification teaches that a plurality of MTSP protease domains can be attached to a 
solid support {e.g,, see page 132, lines 4-8). The instant specification defines an array as a 
collection of elements containing three or more members and that, as in the case for an 
addressable array, the members of the array can be immobilized to discrete identifiable loci 
on the surface of a solid phase {e.g., see page 35, lines 14-20). Hence, for these reasons and 
the reasons articulated above with respect to claims 1 and 40, Appellant respectfiilly submits 
that one of skill in the art would recognize that Appellant was in possession of an array of 
substantially purified single-chain polypeptide consisting only of a protease domain of a type- 
II membrane-type serine protease (MTSP) or a catalytically active fragment thereof as a 
single chain, where the MTSP protease domain or catalytically active fragment thereof has 
serine protease activity as a single chain and a free Cys in the protease domain is replaced 
with another amino acid. 

Dependent Claim 42 

Claim 42 depends from claim 41 and recites that the array comprises polypeptides 
having different MTSP protease domains. Claim 42 as originally filed recited that the array 
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comprises polypeptides having different MTSP protease domains. The specification teaches 
that a plurality of MTSP protease domains can be attached to a solid support {e.g,, see page 
132, lines 4-8). Appellant respectfully submits that, for these reasons and the reasons 
articulated above with respect to claims 1, 40 and 41, one of skill in the art would recognize 
that Appellant was in possession of an array of substantially purified single-chain polypeptide 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fi-agment thereof as a single chain, where the MTSP protease domains or 
catalytically active fi'agments thereof are different, have serine protease activity as a single 
chain and a fi*ee Cys in the protease domains is replaced with another amino acid. 
Dependent Claim 113 

Claim 113 recites a solid support comprising two or more polypeptides of claim 1 2 
linked thereto either directly or via a linker. Claim 12 is not rejected under 35 U.S.C. 1 12, 
first paragraph . The Examiner states that Appellant was in possession of the isolated 
protease domains recited in claim 12, which is directed to the substantially purified 
polypeptide of claim 1, where the MTSP protease domain consists of a sequence of amino 
acid residues selected firom among amino acids 615-855 of SEQ ID No. 2, amino acids 205- 
437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or as amino acids 
217-443 in SEQ ID No. 12. 

The specification describes solid supports and methods for immobilizing MTSP 
protein to solid supports (e.g-., see pages 131-136). The specification teaches exemplary solid 
supports, including supports having any required structure and geometry, such as beads, 
pellets, disks, capillaries, hollow fibers, needles, solid fibers, random shapes, thin films and 
membranes {e.g., page 132, lines 26-29). The specification teaches that a plurality of MTSP 
protease domains, including two or more protease domains, can be attached to a solid support 
(e.g., page 132, lines 4-8), 

Appellant respectfully submits that that, because the Examiner admits that Appellant 
was in possession of the polypeptide of claim 12 and in view of teaching in the specification, 
which describes several different types of solid supports and methods of conjugating isolated 
protease domains to solid supports, including conjugating a plurality of isolated protease 
domains to a solid support, one of skill in the art would recognize that Appellant was in 
possession of the subject matter of claim 1 13 at its effective filing date. 
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Dependent Claim 114 

Claim 1 14 depends from claim 113 and specifies that the polypeptides comprise an 
array. As discussed above, claim 113 recites a solid support that includes two or more 
polypeptides of claim 12. Claim 12 is not rejected under 35 U.S.C. 112, first paragraph . 
Thus, the Examiner agrees that Appellant was in possession of the subject matter of claim 12, 
which is directed to the substantially purified polypeptide of claim 1 , where the MTSP 
protease domain consists of a sequence of amino acid residues selected from among amino 
acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the amino acid 
residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 

The specification teaches that a plurality of MTSP protease domains can be attached 
to a solid support (e.g^., see page 132, lines 4-8). The instant specification defines an array as 
a collection of elements containing three or more members and that, as in the case for an 
addressable array, the members of the array can be immobilized to discrete identifiable loci 
on the surface of a solid phase {e.g., see page 35, lines 14-20. Hence, for the reasons 
discussed above with respect to claim 1 and also because the Examiner has concluded that 



Appellant was in possession of the subject matter of claim 12, and the specification teaches 
and describes the other elements of claim 1 14, Appellant respectftilly submits that one of skill 
in the art would recognize that Appellant was in possession of an array of substantially 
purified single-chain polypeptide consisting only of a protease domain of a type-II 
membrane-type serine protease (MTSP) or a catalytically active fragment thereof as a single 
chain, where the MTSP protease domain or catalytically active fragment thereof has serine 
protease activity as a single chain and a free Cys in the protease domain is replaced with 
another amino acid and where the MTSP protease domain consists of a sequence of amino 
acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 205- 
437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or as amino acids 
217-443 in SEQ ID No. 12. 



Appellant respectftilly submits that the rejection of claims 1,11, 20, 34-36, 40-42, 113 
and 1 14 under 35 U.S.C. §112, first paragraph, as allegedly containing subject matter that 
was not described in the specification in such a way as to reasonably convey to one skilled in 
the art that the inventor, at the time the application was filed, had possession of the claimed 
subject matter, is erroneous in law and fact and, therefore, should be reversed. 



Summary 
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REJECTION OF CLAIMS 1, 11, 20, 34-36, 40-42, 113 AND 114 UNDER 35 U.S.C. 
§112, FIRST PARAGRAPH - Scope of Enablement 

Claims 1, 1 1, 20, 34-36, 40-42, 113 and 114 are rejected under 35 U.S.C. § 1 12, first 
paragraph, because the specification allegedly fails to describe the claimed subject matter in 
such a way as to enable one skilled in the art to make and use the claimed subject matter 
commensurate in scope with these claims. The Examiner states that the specification is 
enabling for a polypeptide that includes amino acids 615-855 of SEQ ID NO:2, amino acids 
205-437 of SEQ ID NO:4, amino acids of SEQ ID NO:6 and amino acids 217-443 of SEQ ID 
NO:l 12. The Examiner alleges that the specification does not reasonably provide enablement 
for a polypeptide consisting of any protease domain of any MTSP or catalytically portion 
thereof and concludes that the claims are drawn to polypeptides having undefined structure. 
The Examiner alleges that predictability of which changes in a protein's amino acid structure 
can be tolerated requires a knowledge of and guidance with regard to the sequence as to which 
amino acids, if any, are tolerant to modification and which are conserved, and detailed 
knowledge of how the protein's structure relates to function. It is alleged that it would require 
undue experimentation for one of skill in the art to make such modified polypeptides with an 
expectation of success because the result of such modifications in unpredictable. It is further 
alleged that the claimed polypeptides encompass a large number of polypeptides and that the 
specification does not provide sufficient guidance on the nature of the changes that can be 
tolerated such that the proteins retain activity. In response to Appellant's arguments in the 
previous Response, evidencing the extensive knowledge in the art with respect to serine 
proteases, the Final Office Action argues that these arguments are not persuasive because the 
specification allegedly does not establish which specific amino acids in the protein's sequence 
can be modified such that the modified polypeptide continues to have proteolytic activity. The 
Examiner alleges that while the art may teach the general structure of MTSP and conserved 
amino acid sequences, protease domains. X-ray crystal structure and other attributes, such 
teachings "will not reduce the burden of undue experimentation on those of ordinary skill in 
the art." Therefore, the Final Office Action concludes, it would require undue experimentation 
to produce claimed polypeptides. 

This rejection respectfully is traversed. The pending claims are directed to protease 
domains of MTSPs, a well-characterized family of proteins; there is no doubt that this family 
of proteins is well known and that those of skill in the art can identify members thereof. It is 
the instant application that teaches that the isolated single-chain protease domain possesses 
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protease activity and that formation of a two-chain structure (by virtue of disulfide bonding 
with a Cys in the protease domain, which is free in the single chain form) is not needed. Thus 
the issue is not identification of an MTSP, but identification of a protease domain in an 
MTSP. The application clearly teaches how to identify a protease domain and how to replace 
the now free Cys that would have participated in forming a two chain structure. There are no 
issues regarding undue experimentation to isolate MTSPs. 

The specification teaches identification, preparation and isolation of protease domains 
and those of skill in the art, in view of the application, readily can identify and isolate a 
protease domain from any MTSP. As discussed above, with respect to the written description 
rejection, the claims are directed to isolated single chain protease domains. The specification 
teaches that those of skill in the art can identify protease domains and also teaches how to 
identify protease domains. One of skill in the art, in light of the specification, could prepare an 
isolated single chain protease domain, as claimed, for any MTSP and replace the now-free Cys 
with another amino acid. Hence there is no reason to limit the claims to particular species of 
the family, when one of skill in the art, in light of the disclosure, can identify all members of 
the genus. 

A. LEGAL STANDARDS - 35 U.S.C. §112, FIRST PARAGRAPH - ENABLEMENT 

The inquiry with respect to scope of enablement under 35 U.S.C. § 1 12, first paragraph, 
is whether it would require undue experimentation to make and use the subject matter as 
claimed. A considerable amount of experimentation is permissible, particularly if it is routine 
experimentation. The amount of experimentation that is permissible depends upon a number of 
factors, which include: the quantity of experimentation necessary, the amount of direction or 
guidance presented, the presence or absence of working examples, the nature of the invention, 
the state of the prior art, the relative skill of those in the art, the predictability of the art, and the 
breadth of the claims (i.e., the 'Wands factors"). In re Wands, 8 USPQ2d 1400 (Fed. Cir. 
1988). 

The starting point in an evaluation of whether the enablement requirement is satisfied is 
an analysis of each claim to determine its scope. The focus of the inquiry is whether everything 
within the scope of the claim is enabled. As concems the breadth of a claim relevant to 
enablement, the only concern should be whether the scope of enablement provided to one skilled 
in the art by the disclosure is commensurate with the scope of protection sought by the claims. 
In re Moore, 439 F.2d 1232, 169 USPQ 236 (CCPA 1971). Once the scope of the claims is 
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addressed, a determination must be made as to whether one skilled in the art is enabled to make 

and use the entire scope of the claimed invention without undue experimentation. 

It is incumbent upon the Examiner to first establish a prima facie case of non- 

enablement. In re Marzocchi, 439 F.2d 220, 223, 169 USPQ 367, 369-70 (CCPA 1971). The 

requirements of 35 USC §112, first paragraph, can be fulfilled by the use of illustrative 

examples or by broad terminology. In re Anderson, 176 USPQ 331, 333 (CCPA 1973): 

... we do not regard section 112, first paragraph, as requiring a specific example 
of everything within the scope of a broad claim ... What the Patent Office is 
here apparently attempting is to limit all claims to the specific examples, not 
withstanding the disclosure of a broader invention. This it may not do. 

In re Grimme, 274 F.2d 949, 952 (CCPA 1960) : 

It is manifestly impracticable for an applicant who discloses a generic 
invention to give an example of every species falling within it, or even to 
name every such species. It is sufficient if the disclosure teaches those skilled 
in the art what the invention is and how to practice it. 

This clause does not require "a specific example of everything within the scope of a 
broad claim." In re Anderson, 176 USPQ 331, at 333 (CCPA 1973), emphasis in original. 
Rather, the requirements of § 112, first paragraph "can be fulfilled by the use of illustrative 
examples or by broad terminology." In re Marzocchi et aL, 469 USPQ 367 (CCPA 
1971)(emphasis added). 

The law is clear that patent documents need not include subject matter that is known in 
the field of the invention and is in the prior art, for patents are written for persons experienced 
in the field of the invention. See Vivid Technologies, Inc. v. American Science and 
Engineering, Inc., 200 F.3d 795, 804, 53 USPQ2d 1289, 1295 (Fed. Cir. 1999) ("patents are 
written by and for skilled artisans"). To hold otherwise would require every patent document 
to include a technical treatise for the unskilled reader. Although an accommodation to the 
"common experience" of lay persons may be feasible, it is an unnecessary burden for inventors 
and has long been rejected as a requirement of patent disclosures. See Atmel Corp,, 198 F.3d 
at 1382, 53 USPQ2d at 1230 (Fed. Cir. 1999) ("The specification would be of enormous and 
unnecessary length if one had to literally reinvent and describe the wheel."); W,L. Gore & 
Assoc, Inc. V. Garlock, Ina, 721 F.2d 1540, 1556, 220 USPQ 303, 315 (Fed. Cir. 1983) 
("Patents are written to enable those skilled in the art to practice the invention, not the public") 

The test of enablement is whether one skilled in the art can make and use what is 
claimed based upon the disclosure in the application and information known to those of skill in 
the art without undue experimentation. United States v. Telectronics, Inc., 8 USPQ2d 1217 
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(Fed. Cir. 1988). A certain amount of experimentation is permissible as long as it is not undue. 

Atlas Powder Co. v. EJ, DuPont de Nemours, 750 F.2d 1569, 224 USPQ 409 (1984). This 

requirement can be satisfied by providing sufficient disclosure, either through illustrative 

examples or terminology, to teach one of skill in the art how to make and how to use the 

claimed subject matter without undue experimentation. In re Anderson, 176 USPQ 33 1 , at 333 

(CCPA 1973). The "invention" referred to in the enablement requirement of section 1 12 is the 

claimed subject matter. Lindemann Maschinen- fabrik v. American Hoist and Derrick Co., 730 

F.2d 1452, 1463, 221 USPQ 481, 489 (Fed. Cir. 1984). 

As a matter of Patent Office practice, then, a specification disclosure which 
contains a teaching of the manner and process of making and using the invention 
in terms which correspond in scope to those used in describing and defining the 
subject matter sought to be patented must be taken as in compliance with the 
enabling requirement of the first paragraph of § 112 unless there is reason to 
doubt the objective truth of the statements contained therein which must be relied 
on for enabling support. Assuming that sufficient reason for such doubt does 
exist, a rejection for failure to teach how to make and/or use will be proper on that 
basis; such a rejection can be overcome by suitable proofs indicating that the 
teaching contained in the specification is truly enabling. . . it is incumbent upon 
the Patent Office, whenever a rejection on this basis is made, to explain why it 
doubts the truth or accuracy of any statement in a supporting disclosure and to 
back up assertions of its own with evidence or reasoning which is inconsistent 
with the contested statement. 

Id, (emphasis in original); See also Fiers v. Revel, 984 F.2d 1 164, 1 171-72, 25 USPQ2d 1601, 
1607 (Fed. Cir. 1993); Gould v. Mossinghoff, 229 USPQ 1,13 (D.D.C. 1985), affd in part, 
vacated in part, and remanded sub nom, Gould v. Quigg, 822 F.2d 1074, 3 USPQ2d 1302 
("there is no requirement in 35 U.S.C. § 1 12 or anywhere else in patent law that a specification 
convince persons skilled in the art that the assertions in the specification are correct"). A 
patent application need not teach, and preferably omits, what is well known in the art. Spectra- 
Physics, Inc. V. Coherent, Inc, 3 USPQ2d 1737 (Fed. Cir. 1987). 

PTO GUIDELINES 
The PTO has promulgated guidelines, which incorporate the above-noted law, for 
examining chemical/biotechnical applications with respect to 35 U.S.C. §1 12, first paragraph, 
enablement. As set forth in the guidelines, the standard for determining whether the 
specification meets the enablement requirement is whether it enables any person skilled in the 
art to make and use the claimed invention without undue experimentation. In re Wands, 858 
F.2d 731, 737, 8 USPQ2d 1400 (Fed. Cir. 1988). In determining whether any 
experimentation is "undue," consideration must be given to the above-noted factors. 
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As indicated in the published guidelines, it is improper to conclude that a disclosure is 
not enabling based on an analysis of only one of the above factors while ignoring one or more 
of the others. The analysis must consider all the evidence related to each of the factors, and 
any conclusion of non-enablement must be based on the evidence as a whole. Id. 8 USPQ2d 
at 1404 & 1407. 

B, THE REJECTION OF CLAIMS 1, 11, 20, 34-36, 40-42, 113 AND 114 UNDER 
35 U.S.C. §112, FIRST PARAGRAPH SHOULD BE REVERSED BECAUSE 
THE SPECIFICATION MEETS THE WRITTEN DESCRIPTION 
REQUIREMENT WITH RESPECT TO ENABLEMENT 

APPLICATION OF THE FACTORS ENUMERATED IN IN RE WANDS 
Claim 1 

It respectfully is submitted that analysis of enablement requires consideration of all of 
the ''''Wands Factors" and that focusing on one or two of the factors is a misapplication of the 
law. Appellant has discussed application of the Wands Factors" in the previous responses. It 
would not require undue experimentation to isolate single-chain protease domains from any 
MTSP polypeptide. Further, it would not require undue experimentation to make modifications 
thereto. The Examiner admits that enzyme isolation techniques and recombinant and 
mutagenesis techniques are known in the art, and that it is routine in the art to screen for 
substitutions or modifications, including multiple substitutions and multiple modifications as 
encompassed by the instant claims (see Final Office Action, Exhibit 2, page 11). As discussed 
in detail below, and previously, a consideration of the factors enumerated in In re Wands 
demonstrates that the application teaches how to make and use the subject matter as claimed 
without undue experimentation. 

i. Breadth of the Claims 

Claim 1 is directed to an isolated substantially purified single-chain polypeptide 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, wherein the protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with another amino acid. Claims 11, 20, 34-36, 40- 
42, 113 and 114 ultimately depend from claim 1 and recite additional features and specific 
family members. Claim 1 1 is directed to the substantially purified polypeptide of claim 1, 
and specifies that the MTSP is selected from among MTSPl, MTSP3, MTSP4 and MTSP6. 

Claim 20 recites that a free Cys in the protease domain is replaced with a serine. 
Claim 34 recites particular polypeptides within the scope of claim 1 . Claims 35 and 36 are 
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directed to conjugates including a polypeptide of claim 1 and a targeting agent linked to the 
protein directly or via a linker. Claims 40-42 are directed to a solid support including two or 
more polypeptides of claim 1 linked thereto either directly or via a linker. Claims 113 and 
114 are directed to a solid support including two or more polypeptides of claim 12 linked 
thereto either directly or via a linker. 

Hence the claims include as an element an isolated protease domain of a member of 
the MTSP family in which a fee Cys is replaced with another amino acid. The specification, 
as noted, describes all MTSP family members known at the time of filing and provides four 
new members of the family and methods for identifying other members of the MTSP family. 
Thus, the claims are of the same scope as the disclosure in the application. 

ii. Level of Skill 

The level of skill in this art is recognized to be high (see, e.g,, Ex parte Forman, 230 
USPQ 546 (Bd. Pat. App. & IntT 1986)). The numerous articles and patents made of record 
in this application address a highly skilled audience and further evidence the high level of 
skill in this art. 

iii. Teachings of the Specification 

As discussed above and previously, the specification teaches that MTSP polypeptides 
constitute a recognized well known and well characterized family of serine proteases. For 
example, page 18, lines 1-23 of the specification recites: 

As used herein, "transmembrane serine protease (MTSP)" refers to a family of 
transmembrane serine proteases that share common stmctural features as described herein 
(see, also Hooper et al. (2001) J. Biol. Chem. 276:857-860). Thus, reference, for example, 
to "MTSP" encompasses all proteins encoded by the MTSP gene family, including but are 
not limited to: MTSPl, MTSP3, MTSP4 and MTSP6, or an equivalent molecule obtained 
from any other source or that has been prepared synthetically or that exhibits the same 
activity. Other MTSPs include, but are not limited to, corin, enteropeptidase, human 
airway trypsin-like protease (HAT), MTSPl, TMPRSS2, and TMPRSS4. Sequences of 
encoding nucleic molecules and the encoded amino acid sequences of exemplary MTSPs 
and/or domains thereof are set forth in SEQ ID Nos. 1-12, 49, 50 and 61-72. The term 
also encompasses MTSPs with conservative amino acid substitutions that do not 
substantially alter activity of each member, and also encompasses splice variants thereof. 
Suitable conservative substitutions of amino acids are known to those of skill in this art 
and may be made generally without altering the biological activity of the resulting 
molecule. Of particular interest are MTSPs of mammalian, including human, origin. 
Those of skill in this art recognize that, in general, single amino acid substitutions in non- 
essential regions of a polypeptide do not substantially alter biological activity (see, e.g., 
Watson et al. Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin/Cummings 
Pub. Co., p.224). 
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The specification teaches that a protease domain from an MTSP polypeptide is active as a 
single-chain polypeptide. Additionally, smaller fragments of the protease domain also are 
active as single-chain polypeptides (page 18, line 24-page 19, line 2): 

As used herein, a "protease domain of an MTSP" refers to the protease domain of 
MTSP that is located within the extracellular domain of a MTSP and exhibits serine 
proteolytic activity. It includes at least the smallest fragment thereof that acts catalytically 
as a single chain form. Hence it is at least the minimal portion of the extracellular domain 
that exhibits proteolytic activity as assessed by standard assays in vitro assays. Those of 
skill in this art recognize that such protease domain is the portion of the protease that is 
structurally equivalent to the trypsin or chymotrypsin fold. 

The specification further teaches that MTSP protease domains can vary in sequence but that 
these proteins retain a conserved structure as well as sequence identity to identified MTSP 
proteins exemplified in the application. For example, see page 19, lines 3-24, which recites: 

Exemplary MTSP proteins, with the protease domains indicated, are illustrated in 
Figures 1-3, Smaller portions thereof that retain protease activity are contemplated. The 
protease domains vary in size and constitution, including insertions and deletions in 
surface loops. They retain conserved structure, including at least one of the active site 
triad, primary specificity pocket, oxyanion hole and/or other features of serine protease 
domains of proteases. Thus, for purposes herein, the protease domain is a portion of a 
MTSP, as defined herein, and is homologous to a domain of other MTSPs, such as corin, 
enteropeptidase, human airway trypsin-like protease (HAT), MTSPl, TMPRSS2, and 
TMPRSS4, which have been previously identified; it was not recognized, however, that an 
isolated single chain form of the protease domain could function proteolytically in in vitro 
assays. As with the larger class of enzymes of the chymotrypsin (SI) fold (see, e.g., 
hitemet accessible MEROPS data base), the MTSPs protease domains share a high degree 
of amino acid sequence identity. The His, Asp and Ser residues necessary for activity are 
present in conserved motifs. The activation site, which results in the N-terminus of the 
second chain in the two chain forms is has a conserved motif and readily can be identified 
(see, e.g., amino acids 801-806, SEQ ID No. 62, amino acids 406-410, SEQ ID No. 64; 
amino acids 186-190, SEQ ID No. 66; amino acids 161-166, SEQ ID No. 68; amino acids 
255-259, SEQ ID No. 70; amino acids 190-194, SEQ ID No. 72). 

The application describes the fiill length sequence and protease domain of all species of MTSP 
family members known at the time of filing, including MTSPl, HAT, corin, enteropeptidase, 
TMPRSS4 and TMPRSS2. The specification also identifies four nev^ family members. 

As discussed above, identification of the protease domain from an MTSP region merely 
requires identification of the activation cleavage site, as is outlined in the specification, 
discussed above and known in the art. The locus of the protease domain in the known MTSP 
family members is known, and the instant application provides protease domains from the 
known family members, either directly or by incorporation of reference. 

Furthermore, notwithstanding that the specification provides and describes the protease 
domain of all members of the family known at the time of filing, plus the four additional family 
members, a comparison of sequence identity among family members (see, e.g., Figure 4 of the 
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application) reveals that the protease domains share conserved sequences, including the 
catalytic triad of His, Asp and Ser residues and their surrounding conserved motifs. 
Additionally, the specification demonstrates that MTSP protease domains can have a 
reasonable amount of sequence variation and yet retain serine protease activity. MTSPl, 
MTSP3, MTSP4 and MTSP6 protease domains share about 40% sequence identity with each 
other. The specification teaches that each of these protease domains is an example of an MTSP 
protease domain that has activity in the single chain form. 

The specification also teaches additional modifications. For example, see page 26, 

lines 13-25, which recites: 

Hence smaller portions of the protease domains, particularly the single chain domains, 
thereof that retain protease activity are contemplated. Such smaller versions will generally 
be C-terminal truncated versions of the protease domains. The protease domains vary in 
size and constitution, including insertions and deletions in surface loops. Such domains 
exhibit conserved structure, including at least one structural feature, such as the active site 
triad, primary specificity pocket, oxyanion hole and/or other features of serine protease 
domains of proteases. Thus, for purposes herein, the protease domain is a single chain 
portion of an MTSP, as defined herein, but is homologous in its structural features and 
retention of sequence of similarity or homology the protease domain of chymotrypsin or 
trypsin. Most significantly, the polypeptide will exhibit proteolytic activity as a single 
chain. 

The specification teaches that included in the conserved features of MTSP protease domain 
polypeptides is a catalytic triad as well as the activation cleavage site, which defines the 
terminus of the protease domain polypeptides when they are isolated as single chain 
polypeptides. 

The specification explains that beyond such conserved features the polypeptides are 

tolerant of modification. The specification explains that such modifications can be effected 

using numerous methods known in the art. For example, at page 77, line 17 through page 78, 

line 11, the specification states: 

A variety of modifications of the MTSP proteins and domains are 
contemplated herein. An MTSP-encoding nucleic acid molecule can be modified 
by any of numerous strategies known in the art (Sambrook et al., 1990, Molecular 
Cloning, A Laboratory Manual, 2d ed.. Cold Spring Harbor Laboratory, Cold 
Spring Harbor, New York). The sequences can be cleaved at appropriate sites 
with restriction endonuclease(s), followed by fiirther enzymatic modification if 
desired, isolated, and ligated in vitro. In the production of the gene encoding a 
domain, derivative or analog of MTSP, care should be taken to ensure that the 
modified gene retains the original translational reading frame, uninterrupted by 
translational stop signals, in the gene region where the desired activity is encoded. 

Additionally, the MTSP-encoding nucleic acid molecules can be mutated in 
vitro or in vivo, to create and/or destroy translation, initiation, and/or termination 
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sequences, or to create variations in coding regions and/or form new restriction 
endonuclease sites or destroy pre-existing ones, to facilitate further in vitro 
modification. Also, as described herein muteins with primary sequence 
alterations, such as replacements of Cys residues and elimination of glycosylation 
sites are contemplated. Such mutations may be effected by any technique for 
mutagenesis known in the art, including, but not limited to, chemical mutagenesis 
and in vitro site-directed mutagenesis (Hutchinson et aL^ J. Biol. Chem. 253:6551- 
6558 (1978)), use of TAB® linkers (Pharmacia). In one embodiment, for 
example, an MTSP protein or domain thereof is modified to include a fluorescent 
label. In other specific embodiments, the MTSP protein is modified to have a 
heterofiinctional reagent, such heterofunctional reagents can be used to crosslink 
the members of the complex. 

The specification exemplifies variation in MTSP sequences. For example the 
specification provides exemplary MTSPl, MTSP3, MTSP4 and MTSP6 sequences, including 
the sequences of the isolated protease domains. The specification also provides sequences of 
other family members, and, as discussed above, how to identify the protease domain based on 
the consensus sequence thereof, which is conserved among serine proteases. The 
specification explains that MTSPl and MTSP3 amino acid sequences have about 43% identity 
with each other (for example, see page 162, lines 1-2). The specification also discloses that 
MTSPl and MTSP4 have about 37% amino acid sequence identity (for example, see page 
167, lines 25-29). The specification also teaches that MTSP4 and MTSP6 share about 60% 
amino acid sequence identity (for example, see page 172, lines 4-9). The specification teaches 
that each of the protease domains of these MTSP family members is active as single chain that 
contains only the protease domain or a smaller catalytically active portion of the protease 
domain (see, for example at page 20, lines 1-6). Hence, the specification teaches that MTSP 
protease domains that retain the conserved catalytic triad are tolerant of sequence modification 
yet retain activity, and demonstrates that exemplary polypeptides that retain the catalytic triad 
and that have about 40%-60% and greater sequence identity are active as single chain 
polypeptides. 

Notwithstanding differences among the sequences of the family members, the 
specification teaches and provides sequences of most of the family members, refers to 
publications that describe other family members, teaches how to identify a protease domain. 
As discussed above, the instant claims are not directed to discovery of MTSPs as a family, but 
the discovery that the isolated protease domain has activity as a single-chain isolated 
polypeptide. Once one of skill in the art has an MTSP of any type or sequence, one of skill in 
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the art, based on the teachings in this specification, isolate the single chain protease domain 
thereof. The specification clearly provides guidance for doing so. 

The specification teaches a modifications of the MTSP polypeptides. For example, the 
specification provides exemplary modifications including conservative amino acid 
substitution (for example, see page 10, lines 3-13) and modifications of cysteine residues 
and/or of glycosylation sites (for example, see page 78, lines 1-7). The specification also 
discloses that non-natural amino acids can be introduced as a substitution or addition in the 
MTSP polypeptides (for example, see page 79, lines 10-21). 

More significantly, the pending claims are directed, not to fiiU-length MTSPs, but to 
isolated single-chain protease domains, where the free Cys is replaced with another amino acid 
that have serine protease activity. One of skill in the art, with an MTSP polypeptide in hand, 
could readily identify and isolate the protease domain of any MTSP as claimed and replace a 
free Cys with another amino acid residue. 

iv. Knowledge of those of skUl in the art 

As discussed above, at the time of filing of the application and before, those of skill in the 
art were very familiar with serine proteases generally, and with the MTSP family in particular. 
The MTSP family was known as was the locus of the protease domain in members of the MTSP 
family. What was absent was any understanding or recognition that an isolated single chain 
protease domain would have activity; hence, such was never isolated. In view of the instant 
application teaching that such protease domains have activity as single chain polypeptides, the 
skilled artisan can readily isolate any protease domain of an MTSP as a single chain and if 
necessary test the isolated protease domain for the requisite activity. Nothing more need be 
known regarding the requisites for activity. 

Notwithstanding this, there was a large body of literature directed to serine proteases and 
there was general understanding of their structures and requisites for activity (see for example. 
Hooper et a/., J. Biol. Chem. 276: 857-860 (2001), Exhibit 15; Nienaber et aL, J. Biol. Chem. 
275: 7239-7248 (2000), Exhibit 24; Sommerhoff a/., Proc. Natl. Acad. Sci. USA 96: 10984- 
10991 (1999), Exhibit 34; Lu et aL, J. Mol. Biol. 292: 361-373 (1999), Exhibit 21; Xu et al, J. 
Biol. Chem. 275: 378-385 (2000), Exhibit 41; Lin et aL, J. Biol. Chem. 274: 18231-18236 
(1999), Exhibit 20; and Bryan, Biochem. Biophys. Acta 1543: 200-203 (2000), Exhibit 7). 
These references detail the existing crystal structures, structural comparisons and structural 
similarities of MTSPs. 
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This extensive knowledge also is evidenced, for example, in the appUcation as filed and 
in the literature made of record in the submitted Information Disclosure Statements. As noted 
in the application, the MTSP protease family was known (for example, see pages 4-5). Serine 
proteases are a family that can be distinguished fi'om many other types of proteins and enzymes 
because they have highly conserved structures (see e.g., Lin et aL^ J. Biol. Chem. 274: 1823 1- 
18236 (1999), Exhibit 20 and Yan et aL, J. Biol. Chem. 274: 14926-14935 (1999), Exhibit 44). 
Moreover, it was known at the time of filing that there is a known correlation between 
retention of the catalytic triad and retention of serine protease activity. Hence, available to one 
of skill in the art was the knowledge that serine protease activity could be retained in a serine 
protease by retaining the conserved structure of the catalytic triad (see for example, Carter et 
aL, Nature 332: 564-368 (1988), Exhibit 8, Sprang et al.. Science 237: 905-909 (1987), Exhibit 
35, Craik et al. Science 237: 909-913 (1987), Exhibit 10 and Bachovchin et aL, Proc. Natl 
Acad. Sci. 78: 7323-7326 (1981), Exhibit 5). In addition, other features were identified at the 
time of filing and before as highly conserved features in serine proteases including a cleavage 
site at the N-terminus of the protease domain, a substrate specificity pocket in the protease 
domain and conserved cysteines that participate in disulfide bonding (see for example, Figure 4 
and page 18235 of Lin et al (Exhibit 20) and Figure 2 and page 18236 of Yan et al. Exhibit 
44). Thus, the requisites for retention of serine protease activity are well known and 
characterized and were available at the effective filing date of the claimed subject matter. 
Hence, a wide variety of structural information on serine proteases was well-known in the art. 

Furthermore, the instant claims only require identification of the protease domain of an 
MTSP, and its isolation as a single chain polypeptide. The specification includes and describes 
the protease domains of all MTSP family members known at the time of filing the application. 
Based on the teachings of the specification and known in the art, those of skill in the art can 
readily identify the protease domain region in an MTSP using, e.g., the catalytic triad, the 
cleavage site at the N-terminus of the protease domain and conserved cysteines that participate 
in disulfide bonding as markers, and, if necessary test it for protease activity. Dawson et al. 
(U.S. Pat. No. 5,645,833 (1997), Exhibit 1 1) teaches that the serine protease domain can be 
recognized by its homology with other serine proteases (col. 6, lines 29-32). 

The methods and guidance for comparing amino acid sequences to generate and 
confirm sequences with sequence identity to an MTSP polypeptide sequence such as SEQ ID 
NOS: 2, 4, 6 and 12 was available and routine in the art at the time of filing the instant 
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application. As described in the instant specification, computer algorithms such as the 
"FAST A" program, using for example, the default parameters as in Pearson et al., Proc. Natl. 
Acad. Sci. USA 85: 2444 (1988), Exhibit 28, were available. Other programs were available 
(see Devereux, J., et al,. Nucleic Acids Research 12(I):387 (1984), Exhibit 12). In addition, 
methods for generating nucleotide and protein sequence variation were widely available in 
the art. Thus, one of skill in the art could use such programs with a serine protease sequence, 
for example, to align the sequence and identify the structural features of importance for 
retention of activity and use the methods for generating sequence variation to make protein 
variants. 

Methods for assaying protease activity including protease specificity, level of activity 
and response to inhibitors was well known in the art (see, for example, Lu et a/., J. Mol. Biol. 
292: 361-373 (1999) (Exhibit 21) and Xu et aL, J. Biol. Chem. 275: 378-385 (2000) (Exhibit 
41)). Methods for high throughput assays and detection also were widely available (e.g., see 
generally, Silverman etal., Curr. Opin. Chem. Biol., 2:397-403 (1998) (Exhibit 32) and 
Sittampalam et aL, Curr. Opin. Chem. BioL, 1 :384-91 (1997) (Exhibit 33). Hence, the 
amount of knowledge of those of skill in the art was extensive and the requisite structural and 
functional features required for protease activity was well known. 

The Examiner states that the specific amino acid positions within a protein's sequence 
where amino acid modification can be made with a reasonable expectation of success in 
obtaining the desired activity are limited in any protein and the result of such modifications is 
unpredictable. Appellant respectfully disagrees in the case of the family of MTSPs. The 
application and the art made of record establish that MTSPs are well known in the art and the 
structural requirements for activity are known and that the instantly claimed polypeptides 
share sequence homology with the chymotrypsin/trypsin family for which tertiary structures 
are known. For example, it was known in the art that serine protease activity could be 
retained in an MTSP by retaining the conserved structure of the catalytic triad (see e.g., Craik 
et aL, Science 237: 909-13 (1987), Exhibit 1 and Carter et aL, Nature 332: 564-568 (1988), 
Exhibit 8). Other highly conserved features in serine proteases also were known to the skilled 
artisan. These include a cleavage site at the N-terminus of the protease domain, a substrate 
specificity pocket in the protease domain and conserved cysteines that participate in disulfide 
bonding (see, e.g.. Figure 4 and page 18235 of Lin et aL (Exhibit 20) and Figure 2 and page 
18236 of Yan et al. (Exhibit 44). The specification also provides exemplary assays for testing 
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catalytic activity of the polypeptides using routine experimental analysis techniques and also 
provides descriptions of how to assess percentage identity and teaches that these techniques 
were well known in the art. The specification also teaches conserved characteristics among 
MTSPs. Furthermore, the MTSPs are a known family of serine proteases, and the protease 
domain of any member can be readily identified using methods and techniques known in the 
art and/or described in the specification. The serine proteases were among the first enzymes 
to be studied extensively (Perona & Craik, Protein Science 4: 337-360 (1995), Exhibit 30). 

Furthermore, the instant claims are directed to the single-chain protease domain or 
active portion thereof, where protease domain is modified to replace a fi"ee Cys with another 
amino acid (for example to prevent aggregation by virtue of interaction among the free Cys 
residues). The claims on appeal are not new MTSPs per se, but to the protease domains of 



The Examiner states that recombinant and mutagenesis techniques and enzyme 
isolation techniques are known and that it is routine to screen for multiple substitutions or 
multiple modifications as encompassed by the instant claims (see Final Office Action, Exhibit 
1, page 11). Thus, routine techniques can be used to identify or synthesize modified MTSP 
serine protease domains. If needed, one of skill in the art can test polypeptides for catalytic 
activity by routine experimentation using the assays provided in the specification or known to 
those of skill in art. 



The application provides working examples that demonstrate each of the features of the 
claimed polypeptides. For instance, the Examples provide detailed guidance for identifying 
and isolating MTSP protease domains. Example 1 describes the cloning of the full-length and 
the protease domain of MTSP3 and replacement of the fi-ee Cys in the isolated protease domain 
with another amino acid. Example 1 also describes expression of the MTSP3 protease domain 
with replaced Cys. Example 1 also describes the use nucleic acid encoding the probe to assess 
tissue-specific and tumor-specific expression of the MTSP3. 

Example 2 describes the identification and cloning of two MTSP4 polypeptides, 
MTSP4-S and MTSP4-L. Example 2 describes cloning of the full-length polypeptides and also 
the protease domains thereof, and also describes uses of the clones to obtain gene expression 
profiles. Example 3 describes the identification and cloning of an MTSP6 polypeptide and 
protease domain thereof, and also gene expression profiles. Example 4 describes expression of 



MTSPs. 



V. Working Examples 
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the MTSP4 (both variants), MTSP3 and MTSP6 protease domains, with the replaced Cys. 
Example 6 describes cloning and isolated of the protease domain of MTSPl . Example 7 
describes production of the protease domain of MTSPl and purification of the protease 
domain, hi each case, an MTSP polypeptide sequence is identified that includes a protease 
domain with a cleavage site and a catalytic triad (see, e.g., Figure 4). As noted, for example, in 
Example 1, identification of MTSP3 as a serine protease required only 43% sequence identity. 
Similarly, Example 2 demonstrates that 37% sequence identity with MTSPl was sufficient to 
identify MTSP4. 

The Examples demonstrate additional features of the claimed polypeptides. For 
example, the examples demonstrate production and expression of MTSP protease domains, 
where they free Cys is replaced with another amino acid. The working examples further 
demonstrate that the MTSP polypeptides, sharing, for example, 37-43% sequence identity, are 
active as a single chain protease domains. 

The Examples demonstrate expression of single chain protease domains. Examples 4 
and 5 describe additional expression of MTSP3, MTSP4 and MTSP6 using Pichia pastoris. 
Examples 6 and 7 provide a detailed description of the cloning, expression and purification of 
an MTSPl single chain protease domain. Example 8 provides detailed serine protease assays 
for MTSPl . Additionally, the examples demonstrate replacement of the free Cys. For 
example. Example 1 demonstrates that replacing the cysteine to serine does not substantially 
alter serine protease activity. The examples demonstrate identification of a variety of MTPSs, 
sharing 37-43% sequence identity, and the expression of the protease domains thereof, where 
the Cys is replaced with another amino acid. 



The predictability at issue is whether one of skill in the art could isolate protease 
domains from MTSP family members and variants thereof The issue is not whether the 
claims encompass variant MTSPs, but whether one of skill in the art in possession of an 
MTSP could prepare an isolated protease domain in which a free Cys is replaced with another 
amino acid. Predictability goes to reproducibility. Issues regarding modification of MTSPs 
and requisites therefore are irrelevant. Appellant respectfully submits that one of skill in the 
art, given the instant disclosure, could predictably make such polypeptides, because the 
MTSP family is well known and characterized and the sequences of exemplary new family 
members, as well as all known members, are provided in the application. One of skill in the 



vi. Predictability 
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art readily make minor amino acid variation using routine techniques, and, if needed, test 
such polypeptide variants for serine protease activity. The working example demonstrate 
repeating this with 5 different polypeptides (MTSPl, MTSP3, MTSP4-S, MTSP4-L and 
MTSP6). There is no doubt that isolation of a protease domain from an MTSP is reproducible 
and, thus, predictable. There is no doubt that one of skill in the art could prepare an isolated 
protease domain as claimed using techniques routinely practiced in this art. 

In contrast to the allegations of "unpredictability" set forth in the Final Office Action, 
the specification and the knowledge in the art evidence many factors of predictability with 
respect to MTSP polypeptide variants. First, the specification identifies all known MTSP 
family members, including the sequences thereof (in the sequence listing and/or by 
incorporation by reference of others) and also provides new family members. These are 
defined chemical structures from which one of skill in the art is given a reference point. As 
explained above, included among exemplary polypeptides are MTSPl, MTSP3, MTSP4-S, 
MTSP4-L, MTSP6, HAT, corin, enteropeptidase, TMPRSS4 and TMPRSS2. The 
specification demonstrates that these MTSP polypeptides, as well as all family members, share 
conserved features including a protease domain with a catalytic triad and N-terminal activation 
cleavage site. Furthermore, the specification teaches isolation of the protease domains as 
single chains and demonstrates that they possess proteolytic activity. As discussed above, the 
specification provides detailed guidance for identifying a protease domain of any MTSP family 
member. 

Second, the specification delineates structural and fianctional features of the protein. 
These features identify key regions and residues that one of skill in the art would know to 
conserve in order to retain serine protease activity. These features also provide reference 
points for alignments with other known serine proteases. These features also allow one of 
skill in the art to make further structure-function correlations, again providing predictable 
correlations of regions and residues to conserve or change. As evidenced by the references 
cited in the specification and in the Information Disclosure Statements of record in this 
application and provided herein, a large body of knowledge pertaining to structure- function 
relationships of serine proteases was known in the art. In addition, the specification provides 
exemplary assays to assess serine protease activity, including a variety of substrates, for 
MTSP activity. One of skill in the art can readily and routinely test any MTSP family 
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member protease domain or a variant thereof for serine protease activity as a single chain 
protease. 

As taught in the specification as well as evidenced by the art of record, maintenance 
of the catalytic triad is sufficient to retain serine protease activity (e.g., see Carter et aL 
(Nature 332: 564-568 (1988), Exhibit 8 and Craik et aL (Science 237: 909-913 (1987), 
Exhibit 10)). Therefore, one of skill in the art could make and generate MTSP family 
member protease domains fi"om any MTSP known to one of skill in the art or identify 
protease domains in new MTSP family members. In the unlikely event that it was needed, 
protease activity could easily and routinely be confirmed using the assays provided in the 
application and known in the art. The routine manipulations to identify and isolate an MTSP 
protease domain as a single chain are known in the art. 

The experimentation necessary to isolate and use protease domains of MTSP 
polypeptides, as described above, is commonly practiced in this art and routine. "Enablement 
is not precluded by the necessity for some experimentation such as routine screening. 
Experimentation needed to practice the invention must not be undue experimentation. 'The key 
word is undue, not experimentation.' " In re Wands, 858 F.2d at 737-38 (quoting /n re 
Angstadt, 537 F.2d at 504; emphasis added; additional internal citations omitted). The 
Examiner admits that enzyme isolation techniques and recombinant and mutagenesis 
techniques are known and that it is routine to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims (see Final Office Action, Exhibit 2, page 
1 1). The art related to serine proteases also demonstrates that such experimentation is not 
undue. For example, Pearson et aL (Cabios Invited Review 13(4): 325-332 (1997) (Exhibit 
29)) explains that serine proteases share a conserved catalytic site, the catalytic triad and have 
several diagnostic motifs throughout the protein including a conserved protein fold and anti- 
parallel barrel structures that contribute to the function of the protease. Pearson et aL states 
that one could recognize proteins that have protease activity based on these conserved 
structures. Hence, generation of variants with serine protease activity is routine because one of 
skill in the art can use such conserved features as a guide for designing the location of 
variations to maintain these features. In addition, Cheah et aL (J. Biol. Chem. 265: 71 80-7187 
(1990), Exhibit 9) provides a demonstration of the predictability of generating variants of 
serine proteases based on an exemplary sequence. Cheah et aL uses known structural and 
functional information about trypsin-like serine proteases to obtain mutations in a rhinovirus 
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3C protease with predicted functional phenotypes. Thus, the art available at the time of filing, 
and before, dennonstrates that one of skill in the art could make variants of a serine protease in 
a predictable maimer. Therefore, one of skill in the art could make protease domains as single 
chains from an MTSP family member and also generate variants of MTSP polypeptides, using 
routine biotechnology techniques. Activity of the single chain protease domains and variants 
thereof could easily and routinely be confirmed using the assays provided in the application 
and known in the art. The routine manipulations to generate an MTSP single chain protease 
domain are not unpredictable. 

As discussed above, the issue is not whether the claims encompass variant MTSPs, but 
whether one of skill in the art in possession of an MTSP could prepare an isolated protease 
domain in which a free Cys is replaced with a another amino acid. The instant application 
identifies MTSP polypeptides and exemplifies that isolated serine protease domains possess 
serine protease activity as a single chain. Such demonstration of single chain activity had not 
been demonstrated before the instant application. The application provides adequate 
description to demonstrate that a common feature among the MTSP family members is the 
activity of a single chain form that includes the protease domain or catalytically active portions 
thereof in the absence of other MTSP portions. The application provides exemplary MTSP's 
that share about 40% sequence identity and possess such features. As discussed, the working 
examples, demonstrate reproducibility, producing 5 different protease domains. Therefore, the 
specification demonstrates that by following the teachings of the application, one of skill in the 
art can predictably identify, make and use substantially purified polypeptides consisting of an 
MTSP protease domain or catalytically active fragment thereof having serine protease activity 
as a single chain. 



There is nothing of record to suggest that production or use of any of the claimed 
polypeptides would require development of new procedures, techniques or excessive 
experimentation. Protein extraction, purification and synthesis methods have been used for 
decades. The specification provides a detailed working example for fermentation and isolated 
of an MTSP protease domain. As discussed above, MTSP family members are provided and 
described in the application and are well known in the art. The specification and the art 
describe conserved features that can be used to identify MTSP family members and the 
protease domain thereof. Such features include the catalytic triad, an N-terminal activation 



vii. The amount of experimentation required 
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cleavage site and conserved cysteines that participate in disulfide bonding. If needed, assays 
for evaluating activity of the polypeptides are taught in the specification and are known in the 
art. Such assays are routine in this art and do not require excessive experimentation. 

The Examiner states that recombinant and mutagenesis techniques and enzyme 
isolation techniques are known and that it is routine to screen for multiple substitutions or 
multiple modifications as encompassed by the instant claims (see Final Office Action, Exhibit 
1, page 11). As discussed, mutagenesis methods are not required to make and use the 
polypeptides as claimed. The instant claims are directed to isolated protease domains of 
MTSP family members; one of skill in the art can identify and isolate the protease domain of 
any MTSP family member, identify a fi-ee Cys and replace it with another amino acid as 
described in the application. Hence, the claimed polypeptides can be synthesized, isolated and 
characterized using routine testing, and, if necessary, one of skill in the art can test 
polypeptides for catalytic activity by routine experimentation using the assays provided in the 
specification or known to those of skill in art. Appellant notes that "a considerable amount of 
experimentation is permissible, if it is merely routine . . In re Wands, 858 F.3d 731, 737. 

Conclusion 

In light of the breadth of the claims, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 
art, and the fact identification and isolation of protease domains in MTSP family members and 
preparation of single chain forms thereof as well as variants thereof is predictable and 
reproducibly demonstrated, it would not require undue experimentation for one of skill in the 
art to make and use polypeptides with the features as claimed. Hence, a consideration of the 
factors enumerated above leads to the conclusion that undue experimentation would not be 
required to make and use the isolated MTSP protease domains as claimed. Accordingly, 
Appellant respectfially submits that this rejection of claim 1 under 35 U.S.C. §112, first 
paragraph, is erroneous in law and fact and, therefore, should be reversed. 

For the reasons above, each of the dependent claims meets the written description 
requirement and are enabled and, in addition, additional reasons for each dependent claim are 
described below. 

Dependent Claim 11 

Claim 1 1 depends fi-om claim 1 and includes every limitation thereof. Claim 1 1 recites 
that the MTSP of the polypeptide of claim 1 is selected from among MTSPl, MTSP3, MTSP4 
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and MTSP6. The arguments set forth above with respect to claim 1 are incorporated herein. 
The specification describes MTSPl and its protease domain, e.g.^ at pages 54-58. The 
specification describes MTSP3 its protease domain, e.g,^ at pages 58-60 and Example 1 (pages 
160-167). The specification describes MTSP4 its protease domain, e.g.y at pages 60-63 and 
Example 2 (pages 167-171. The specification describes MTSP6 its protease domain, e.g,^ at 
pages 63-64 and Example 3 (pages 171-176). The working examples demonstrate cloning of 
the protease domains, with replaced fi-ee Cys, for each of these. 

In light of the breadth of the claims, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 
art, and the fact that it is predictable to identify protease domains in MTSP family members 
and prepare single chain forms thereof as well as variants thereof, it would not require undue 
experimentation for one of skill in the art to make and use polypeptides with the features as 
claimed. Hence, a consideration of the factors enumerated above leads to the conclusion that 
undue experimentation would not be required to make and use the isolated MTSP protease 
domains of MTSPl, MTSP3, MTSP4 or MTSP6 of claim 11. Accordingly, Appellant 
respectfully submits that this rejection of claim 1 1 under 35 U.S.C. §112, first paragraph, is 
erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 20 

Claim 20 depends from claim 1 and includes every limitation thereof The arguments 
set forth above with respect to claim 1 are incorporated herein. Claim 20 recites that the free 
Cys be replaced with a serine. The Examiner admits that recombinant and mutagenesis 
techniques are known in the art (see Final Office Action, Exhibit 2, page 11). The 
specification exemplifies the replacement of a free Cys in the protease domain with a serine 
residue. For example, see Example 1, which recites, on page 161, lines 4-9: 

To eliminate the free cysteine (at position 310 in SEQ ID No. 4) that exists when the 
protease domain of the MTSP3 protein is expressed or the zymogen is activated, the 
free cysteine at position 310 (see SEQ ID No. 3), which is Cys 122 if a chymotrypsin 
numbering scheme is used, was replaced with a serine. 

Similarly the working Example provide MTSP4s, MTSP6 and MTSPl with the fi-ee Cys 
replaced with serine. One of skill in the art readily can identify the protease domain of any 
MTSP family member, identify a fi-ee Cys and replace it with a serine residue. Such 
substitutions of amino acids are predictable and routine in the art. 

In light of the breadth of claim 20, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 
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art, and the fact that it is predictable to replace a Cys with another amino acid residue, such as a 
serine residue, it would not require undue experimentation for one of skill in the art to make 
and use polypeptides with the features as claimed. Hence, a consideration of the factors 
enumerated above leads to the conclusion that undue experimentation would not be required to 
make and use the isolated MTSP protease domains of claim 20. Accordingly, Appellant 
respectfully submits that this rejection of claim 20 under 35 U.S.C. §112, first paragraph, is 
erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 34 

Claim 34 depends from claim 1 and includes every limitation thereof. The arguments 
set forth above with respect to claim 1 are incorporated herein. Claim 34 recites the MTSP is 
selected from among corin, MTSPl, enteropeptidase, human airway trypsin-like protease 
(HAT), TMPRSS2, and TMPRSS4. For the reasons articulated above with respect to claim 
1, Appellant respectfully submits that the specification is enabling for preparation and use of 
a substantially purified single-chain polypeptide consisting only of a protease domain of a 
type-II membrane-type serine protease (MTSP) or a catalytically active fragment thereof as a 
single chain, where the MTSP protease domain or catalytically active fragment thereof has 
serine protease activity as a single chain and a free Cys in the protease domain is replaced 
with another amino acid. 

The specification specifically recites that the protease domains can be from any 
MTSP family member, including corin, MTSPl, enteropeptidase, human airway trypsin-like 
protease (HAT), TMPRSS2, and TMPRSS4. For example, see page 8, line 30 through page 

10, line 2, which recites: 

The protease domains provided herein include, but are not limited to, the single chain 
region having an N-terminus at the cleavage site for activation of the zymogen, through 
the C-terminus, or C-terminal truncated portions thereof that exhibit proteolytic activity 
as a single-chain polypeptide in in vitro proteolysis assays, of any MTSP family member, 
preferably from a mammal, including and most preferably human, that, for example, is 
expressed in tumor cells at different levels from non-tumor cells, and that is not 
expressed on an endothelial cell. These include, but are not limited to: MTSPl (or 
matriptase), MTSP3, MTSP4 and MTSP6. Other MTSP protease domains of interest 
herein, particularly for use in in vitro drug screening proteolytic assays, include, but are 
not limited to: corin (accession nos. AF133845 and AB013874; see, Yan et al. (1999) J. 
Biol, Chem. 274:14926-14938; Tomia et aL (1998) J. Biochem. 124:784-789; Uan et al, 
(2000) Proc. Natl. Acad. Sci. U.S.A. 97:8525-8529; SEQ ID Nos. 61 and 62 for the 
human protein); enteropeptidase (also designated enterokinase; accession no. U09860 for 
the human protein; see, Kitamoto et aL (1995) Biochem. 27: 4562-4568; Yahagi et aL 
(1996) Biochem. Biophys. Res. Conmiun. 219:806-812; Kitamoto et aL (1994) Proc, 
Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et aL (1994) J. Biol. Chem. 
269:19976-19982; see SEQ ID Nos. 63 and 64 for the human protein); human airway 
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trypsin-like protease (HAT; accession no. AB002134; see Yamaoka et al. J. Biol. Chem. 
273:1 1894-11901; SEQ ID Nos. 65 and 66 for the human protein); hepsin (see, accession 
nos. Ml 8930, AF030065, X70900; Yamaoka et al, (1988) J Biol Chem 27: 1 1895-1 1901; 
Vu et al. (1997) J. Biol. Chem. 272:31315-31320; and Farley et al. (1993) Biochem. 
Biophys. Acta 1 173:350-352; SEQ ID Nos. 67 and 68 for the human protem); TMPRS2 
(see, Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et al. (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100; SEQ ID Nos. 69 and 70 
for the human protein) TMPRSS4 (see, Accession No. NM 016425; Wallrapp et al. 
(2000) Cancer 60:2602-2606; SEQ ID Nos. 71 and 72 for the human protein); and 
TADG-12 (also designated MTSP6, see SEQ ID Nos. 1 1 and 12; see International PCT 
application No. WO 00/52044, which claims priority to U.S. application Ser. No. 
09/261,416). 

The application describes the protease domain of MTSP family members corin, MTSPl, 
enteropeptidase, HAT, TMPRSS4 and TMPRSS2. Each of the specified MTSP family 
members is known and characterized in the art. . In view of the instant application teaching 
that such protease domains have activity as single chain polypeptides, the skilled artisan can 
readily isolate the protease domain of any of corin, MTSPl, enteropeptidase, human airway 
trypsin-like protease (HAT), TMPRSS2, and TMPRSS4 as a single chain and replace the free 
Cys with another £imino acid using routine techniques and if necessary test the isolated 
protease domain for the requisite activity. 

Appellant respectfully submits that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes the MTSP family 
members corin, enteropeptidase, HAT, TMPRSS4 and TMPRSS2, the breadth of claim 34, 
the extensive teachings and examples in the specification, the high level of skill of those in 
this art, the knowledge of those of skill in the art, and the fact that it is predictable to isolate a 
protease domain and replace a Cys with another amino acid residue, it would not require 
undue experimentation for one of skill in the art to make and use polypeptides with the 
features of claim 34. Hence, a consideration of the factors enumerated above leads to the 
conclusion that undue experimentation would not be required to make and use the isolated 
MTSP protease domains of claim 34. Accordingly, Appellant respectfully submits that this 
rejection of claim 34 under 35 U.S.C. §112, first paragraph, is erroneous in law and fact and, 
therefore, should be reversed. 
Dependent Claim 35 

Claim 35 is directed to a conjugate that includes a) a polypeptide of claim 1, and b) a 
targeting agent linked to the protein directly or via a linker, wherein the conjugate has serine 
protease activity. The arguments set forth above with respect to claim 1 are incorporated 
herein. The specification defines a "targeting agent" on page 38, lines 9-15, as: 
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any moiety, such as a protein or effective portion thereof, that provides specific 
binding of the conjugate to a cell surface receptor, which, preferably, internalizes 
the conjugate or MTSP portion thereof. A targeting agent may also be one that 
promotes or facilitates, for example, affinity isolation or purification of the 
conjugate; attachment of the conjugate to a surface; or detection of the conjugate 
or complexes containing the conjugate. 

The specification teaches that the conjugates can be prepared by chemical conjugation, 
recombinant DNA technology or combinations thereof, and provides detailed descriptions of 
chemical conjugation, including acid cleavable, photo-cleavable and heat sensitive linker 
technology and other linkers, preparation of fiision proteins, peptide linkers, conjugation to 
targeting agents, and adsorption, absorption and/or covalent bonding to a solid support (see 
e.g,, pages 123-131). For example, the specification teaches that for the fiision proteins, the 
peptide or fi*agment thereof is linked to either the N-terminus or C-terminus of the MTSP 
protein domain (e.g., see page 124, lines 25-26). The specification teaches that chemical 
conjugation also can be used to form conjugates, where the MTSP protein domain is linked 
via one or more selected linkers or directly to the targeting agent (e,g.y see page 126, lines 2- 
3). The specification describes various types of linkers and describes example of various 
linkers, including peptide linkers and chemical linkers, such as acid cleavable, photo- 
cleavable and heat cleavable linkers (e.g,, see pages 127-130). Methods of preparing protein 
conjugates are well known and routine in the art (e.g., see Brinkley, "A Brief Survey of 
Methods for Preparing Protein Conjugates with Dyes, Haptens, and Cross-linking Reagents" 
in Perspectives in Bioconjugate Chemistry (Claude Meares, ed. 1993, Chapter 4, pages 59- 
70, Exhibit 6). Hence, routine techniques can be used to conjugate isolated protease domains 
to a targeting agent. 

Appellant respectfully submits that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes conjugates of single- 
chain protease domains conjugated to a targeting agent, several different types of conjugation 
technologies for making the conjugates and exemplary conjugates, the breadth of claim 35, 
the high level of skill of those in this art, the knowledge of those of skill in the art, and the 
fact that it is routine and predictable to conjugate a polypeptide to a targeting agent, it would 
not require undue experimentation for one of skill in the art to make and use conjugates with 
the features of claim 35. Hence, a consideration of the factors enumerated above leads to the 
conclusion that undue experimentation would not be required to make and use the conjugates 
of claim 35. Accordingly, Appellant respectfully submits that this rejection of claim 35 under 
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35 U.S.C. §112, first paragraph, is erroneous in law and fact and, therefore, should be 
reversed. 

Dependent Claim 36 

Claim 36 depends from claim 35 and recites a conjugate that includes a targeting 
agent that permits i) affinity isolation or purification of the conjugate; ii) attachment of the 
conjugate to a surface; iii) detection of the conjugate; or iv) targeted delivery to a selected 
tissue or cell. The arguments set forth above with respect to claims 1 and 35 are incorporated 
herein. 

The specification recites, I., at page 14, lines 19-26 and page 123, line 30 through 
page 124, line 7, that the targeting agent of the conjugate permits affinity isolation or 
purification of the conjugate; attachment of the conjugate to a surface; detection of the 
conjugate; or targeted delivery to a selected tissue or cell. The specification teaches 
exemplary targeting agents, including tissue specific or tumor specific monoclonal 
antibodies, a growth factor or fi-agment thereof, such as FGF, EGF, PDGF, VEGF, cytokines, 
including chemokines, and other such agents, a protein or peptide fi-agment that contains a 
protein binding sequence, a nucleic acid binding sequence, a lipid binding sequence, a 
polysaccharide binding sequence, or a metal binding sequence, or a linker for attachment to a 
solid support (see, I., page 124, lines 8-17 and pages 131-136). The specification also 
describes the construction of affinity binding pairs for isolation and/or purification of the 
conjugate (e.g., see page 131, lines 5-37). Methods of preparing protein conjugates are well 
known and routine in the art (e.g,, see Brinkley, supra. Exhibit 6). Hence, routine, 
reproducible techniques well known to the skilled artisan can be used to conjugate isolated 
protease domains to a targeting agent. 

Appellant respectfiiUy submits that, in view of the arguments set forth above with 
respect to claims 1 and 35, and the teaching in the specification, which describes single-chain 
protease domains conjugated to a targeting agent and the use of such targeting agents for 
affinity isolation or purification of the conjugate or attachment of the conjugate to a surface 
or detection of the conjugate or targeted delivery to a selected tissue or cell, the breadth of 
claim 36, the high level of skill of those in this art, the knowledge of those of skill in the art, 
and the fact that it is routine and predictable to conjugate a polypeptide to a targeting agent, it 
would not require undue experimentation for one of skill in the art to make and use 
conjugates with the features of claim 36. Hence, a consideration of the factors enumerated 
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above leads to the conclusion that undue experimentation would not be required to make and 
use the isolated MTSP protease domains of claim 36. Accordingly, Appellant respectfully 
submits that this rejection of claim 36 under 35 U.S.C. §1 12, first paragraph, is erroneous in 
law and fact and, therefore, should be reversed. 
Dependent Claims 40 and 41 

Claim 40 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. Claim 41 depends from claim 40 and recites that 
the polypeptides comprise an array. The arguments set forth above with respect to claim 1 
are incorporated herein. 

The specification describes solid supports and methods for immobilizing MTSP 
protein, such as a protease domain, to solid supports (e.g., see pages 131-136). For example, 
the specification teaches exemplary solid supports, including supports having any required 
structure and geometry, such as beads, pellets, disks, capillaries, hollow fibers, needles, solid 
fibers, random shapes, thin films and membranes (e.g., page 132, lines 26-29). The 
specification teaches that the solid support can be of any suitable material, such as inorganics, 
natural polymers, and synthetic polymers, including, cellulose, cellulose derivatives, acrylic 
resins, glass, silica gels, polystyrene, gelatin, polyvinyl pyrrolidone, co-polymers of vinyl and 
acrylamide, polystyrene cross-linked with divinylbenzene, polyacrylamides, latex gels, 
polystyrene, dextran, polyacrylamides, rubber, silicon, plastics, nitrocellulose, celluloses, 
natural sponges and highly porous glasses (e,g,, page 134, lines 1-30). 

The specification teaches that a plurality of MTSP protease domains, including two or 
more protease domains, can be attached to a solid support (e.g., page 132, lines 4-8). The 
instant specification defines an array as a collection of elements containing three or more 
members and that, as in the case for an addressable array, the members of the array can be 
inmiobilized to discrete identifiable loci on the surface of a solid phase (see, e.g,^ page 35, 
lines 14-20). 

The specification teaches that the polypeptide can be linked to the solid support 
directly or via a linker (e.g,, page 132, lines 1-2). The specification describes various linking 
technologies that can be used to link the polypeptide to the solid support (e.g., page 135, lines 
1-30). These include reacting the protein with a reactive moiety on the solid support and the 
specification describes exemplary reactive moieties, including amino silane linkages, 
hydroxyl linkages, carboxysilane linkages, N-[3-(triethyoxy-silyl)propyl]phthelamic acid, 
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bis-(2-hydroxyethyl)aminopropyltriethoxysilane, derivatized polystyrenes (page 133, lines 7- 
26), absorption and adsorption or covalent binding to the support, either directly or via a 
linker, such as through disulfide linkages, thioether bonds, and covalent bonds between free 
reactive groups, such as amine and thiol groups, known to those of skill in art (page 135, lines 
1 1-26). Linking a protein to a solid support is routine in the biotechnology arts (e.g.y see 
Means & Feeney, "Chemical Modifications of Proteins: History and Applications" in 
Perspectives in Bioconjugate Chemistry (Claude Meares, ed., 1993, Chapter 2, pages 10-20, 
Exhibit 23). The skilled artisan can select the appropriate conjugation chemistry based on the 
nature of the polypeptide and the solid support without undue experimentation and conjugate 
the protease domain to the solid support using routine techniques known in the art. 

In light of the breadth of claims 40 and 41, the extensive teachings in the specification 
with respect to solid supports and conjugating polypeptides thereto, including conjugating a 
plurality of isolated protease domains to a solid support, the high level of skill of those in this 
art, and the knowledge of those of skill in the art, Appellant respectfially submits that it would 
not require undue experimentation for one of skill in the art to make and use the solid supports 
of claim 40 nor the arrays of claim 41 , Hence, a consideration of the factors enumerated above 
leads to the conclusion that undue experimentation would not be required to make and use the 
solid supports comprising two or more polypeptides of claim 40 linked thereto either directly 
or via a linker of claim 1 13 or the arrays of claim 41 . Accordingly, Appellant respectfully 
submits that this rejection of claims 40 and 41 under 35 U.S.C. §112, first paragraph, is 
erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 42 

Claim 42 depends fi-om claim 41 and recites that the array comprises polypeptides 
having different MTSP protease domains. The arguments set forth above with respect to 
claims 1, 40 and 41 are incorporated herein. The specification teaches that a plurality of 
MTSP protease domains can be attached to a solid support (e.g,, see page 132, lines 4-8). 
Linking a protein to a solid support is routine in the biotechnology arts (e.g.y see Means & 
Feeney, Chemical Modifications of Proteins: History and Applications in Perspectives in 
Bioconjugate Chemistry (Claude Meares, ed., 1993, Chapter 2, pages 10-20, Exhibit 23). 
Whether the protein to be conjugated to a solid support is a single species or multiple species 
of MTSP protease domain does not change the amount of experimentation required to form 
the claimed array. The skilled artisan readily can select the appropriate conjugation 
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chemistry based on the nature of the polypeptides and the solid support without undue 
experimentation and conjugate the polypeptide to the support using routine methods. 

In light of the breadth of claim 42, the extensive teachings in the specification with 
respect to solid supports and conjugating polypeptides thereto, including conjugating a 
plurality of isolated protease domains to a solid support, the high level of skill of those in this 
art, and the knowledge of those of skill in the art, Appellant respectfully submits that it would 
not require undue experimentation for one of skill in the art to make and use the arrays of claim 
42. Hence, a consideration of the factors enumerated above leads to the conclusion that undue 
experimentation would not be required to make and use the arrays of claim 42. Accordingly, 
Appellant respectfully submits that this rejection of claim 42 under 35 U.S.C. §112, first 
paragraph, is erroneous in law and fact and, therefore, should be reversed. 
Dependent Claims 113 and 114 

Claim 113 recites a solid support comprising two or more polypeptides of claim 1 2 
linked thereto either directly or via a linker. Claim 1 14 depends from claim 1 13 and recites 
that the polypeptides comprise an array. Hence, each of claims 113 and 114 includes the 
polypeptide of claim 12 as an element. Claim 12 is not rejected under 35 U.S.C. §112. first 
paragraph . Accordingly, the Examiner admits that the specification is enabling for the 
subject matter of claim 12, which is directed to the substantially purified polypeptide of claim 
1 , wherein the MTSP protease domain consists of a sequence of amino acid residues selected 
from among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, 
the amino acid residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 
12. 

The specification describes solid supports and methods for immobilizing MTSP 
protein to solid supports (e.g., see pages 131-136). For example, the specification teaches 
exemplary solid supports, including supports having any required structure and geometry, 
such as beads, pellets, disks, capillaries, hollow fibers, needles, solid fibers, random shapes, 
thin films and membranes (e.g., page 132, lines 26-29). The specification teaches that the 
solid support can be of any suitable material, such as inorganics, natural polymers, and 
synthetic polymers, including, cellulose, cellulose derivatives, acrylic resins, glass, silica gels, 
polystyrene, gelatin, polyvinyl pyrrolidone, co-polymers of vinyl and acrylamide, polystyrene 
cross-linked with divinylbenzene, polyacrylamides, latex gels, polystyrene, dextran, 
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polyacrylamides, rubber, silicon, plastics, nitrocellulose, celluloses, natural sponges and 
highly porous glasses (e.g^., page 134, lines 1-30). 

The specification teaches that a plurality of MTSP protease domains, including two or 
more protease domains, can be attached to a solid support (e.g^., page 132, lines 4-8). The 
instant specification defines an array as a collection of elements containing three or more 
members and that, as in the case for an addressable array, the members of the array can be 
immobilized to discrete identifiable loci on the surface of a solid phase (see page 35, lines 14- 
20. 

The specification teaches that the polypeptide can be linked to the solid support 
directly or via a linker {e,g,^ page 132, lines 1-2). The specification describes various linking 
technologies that can be used to link the polypeptide to the solid support (e.g.^ page 135, lines 
1-30). These include reacting the protein with a reactive moiety on the solid support. The 
specification describes exemplary reactive moieties, including amino silane linkages, 
hydroxyl linkages, carboxysilane linkages, N-[3-(triethyoxy-silyl)propyl]phthelamic acid and 
derivatized polystyrenes (page 133, lines 7-26). The specification also describes absorption 
and adsorption and covalent binding to the support, either directly or via a linker, such as via 
disulfide linkages or thioether bonds, and covalent bonds between fi-ee reactive groups, such 
as amine and thiol groups, known to those of skill in art (page 135, lines 1 1-26). Linking a 
protein to a solid support is routine in the biotechnology arts {e,g.^ see Means & Feeney, 
Chemical Modifications of Proteins: History and Applications in Perspectives in 
Bioconjugate Chemistry (Claude Meares, ed., 1993, Chapter 2, pages 10-20, Exhibit 23). 
The skilled artisan readily can select the appropriate conjugation chemistry based on the 
nature of the polypeptides and the solid support without undue experimentation and conjugate 
the polypeptide to the support using routine methods. 

In light of the breadth of claims 113 and 114, the extensive teachings in the 
specification with respect to solid supports and conjugating polypeptides thereto, including 
conjugating a plurality of isolated protease domains to a solid support, the high level of skill of 
those in this art, the knowledge of those of skill in the art, and the fact that the Examiner admits 
that the specification is enabling for the polypeptides of claim 12, Appellant respectfully 
submits that it would not require undue experimentation for one of skill in the art to conjugate 
the polypeptides of claim 12 to solid supports to make the solid supports of claim 113 and 
arrays of claim 114. Hence, a consideration of the factors enumerated above leads to the 
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conclusion that undue experimentation would not be required to make and use the soUd 
supports comprising two or more polypeptides of claim 12 linked thereto either directly or via 
a linker of claim 1 13 or the arrays of claim 114. Accordingly, Appellant respectfully submits 
that this rejection of claims 113 and 114 under 35 U.S.C. §112, first paragraph, is erroneous in 
law and fact and, therefore, should be reversed. 



In light of the breadth of the claims, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 
art, and the fact that it is predictable to identify protease domains in MTSP family members 
and prepare single chain forms thereof as well as variants thereof, it would not require undue 
experimentation for one of skill in the art to make and use polypeptides with the features as 
claimed, or conjugates, solid supports or arrays that include the polypeptides. Hence, a 
consideration of the factors enumerated above leads to the conclusion that undue 
experimentation would not be required to make and use the subject matter as claimed. 
Accordingly, Appellant respectfully submits that this rejection of claims 1, 1 1, 20, 34-36, 40- 
42, 113 and 1 14 under 35 U.S.C. §112, first paragraph, is erroneous in law and fact and, 
therefore, should be reversed. 

3. REJECTION OF CLAIMS 1, 11-13, 20, 34-36, 40-42, 113 AND 114 UNDER 35 U.S.C. 
§102(b) - Takeuchi 

Claims 1, 11-13, 20, 34-36, 40-42, 113 and 114 are rejected under 35 U.S.C. §102(b) as 

being anticipated by Takeuchi, because the reference allegedly discloses "a polypeptide 

comprising a fi-agment consisting of a serine protease domain that is 100% identical to amino 

acids 615-855 of SEQ ID NO:2 of the instant invention" and discloses "a catalytically active 

polypeptide comprising the serine protease domain linked to a His-tag." The Examiner states 

that Takeuchi discloses that Cys at position 731 forms a disulfide bond with Cys 604 present in 

the pro domain (see Final Office Action, Exhibit 2, page 1 7). The Examiner alleges that the 

claim limitation "a fi:'ee Cys in the protease domain is replaced with another amino acid" and "a 

fi-ee Cys in the protease domain is replaced with a serine" is a product-by-process type 

limitation. The Examiner alleges that 

[t]he end result of the products of the claims is a serine protease domain or a 
serine protease domain having a serine residue. Whether the product of the 
claimed protein is obtained by replacing a fi"ee cysteine residue or not, the 
product is still the same because the instant claims may be produced by the 
recited modification or not. Therefore, there is no there a structure implied by 
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said limitations. Since the polypeptide of Takeuchi et aL consists of a protease 
domain of a MTSP and the MTSP protease domain has serine protease activity, 
the claims are anticipated by the prior art. Also, since the serine protease domain 
of Takeuchi et aL has a serine residue, claim 20 is also anticipated. 

The rejection respectfully is traversed. 

A. LEGAL STANDARDS - ANTICIPATION UNDER 35 U.S.C. § 102 

Anticipation is a factual determination that . .requires the presence in a single prior 
art disclosure of each and every element of a claimed invention." Lewmar Marine, Inc. v. 
Barient, Inc., 3 U.S.P.Q.2d 1766 (Fed. Cir. 1987). Moreover, "[a] claim is anticipated only if 
each and every element as set forth in the claim is found, either expressly or inherently 
described, in a single prior art reference." Verdegaal Bros. v. Union Oil of California, 2 
U.S,P.Q.2d 1051, 1053 (Fed. Cir. 1987) (emphasis added). 

Federal Circuit decisions have repeatedly emphasized the notion that anticipation 
cannot be found where less than all elements of a claimed invention are set forth in a 
reference. See, e.g. Transclean Corp. v. Bridgewood Services, Inc., 290 F.3d 1364 (Fed. Cir. 
2002). In this regard, a reference disclosing "substantially the same thing" is not enough to 
anticipate. Jamesbury Corp. v. Litton Indust. Prod., Inc., 756 F.2d 1556, 1560 (Fed. Cir. 
1985). A reference must clearly disclose each and every limitation of the claimed invention 
before anticipation may be found. 

Further, anticipation cannot be shown by combining more than one reference to show 
the elements of the claimed invention, hi re Saunders, 444 F.2d 599 (C.C.P.A. 1971). All 
elements of a claimed invention must be disclosed in one, solitary reference. As such, it is 
clear that a reference cannot be utilized to render a claimed invention anticipated without 
identical disclosure. 

B. THE REJECTION OF CLAIMS L 11-13. 20. 34-36. 40-42, 113 AND 114 
UNDER 35 U.S.C. S102fb) SHOULD BE REVERSED BECAUSE TAKEUCHI 
DOES NOT ANTICIPATE THE CLAIMED SUBJECT MATTER 

1. Disclosure of Takeuchi 

Takeuchi discloses a polypeptide that contains 855 amino acids and is designated MT- 
SP 1 . This protein has sequence identity with the full-length MTSPl set forth as SEQ ID NO:2 
of the instant application. Takeuchi discloses an expression vector that includes nucleic acid 
encoding the protease domain plus the pro-domain (see page 1 1055, left col., third full 
paragraph). Takeuchi discloses that its expression vector includes the mature protease domain 
and a small portion of the pro-domain and was designed to over-express the sequence encoding 
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a polypeptide containing amino acids 596-855 with a His-tag fusion to produce as a construct 
Met-Arg-Gly-Ser-His6-aa596-855 (page 1 1055, column 2, third full paragraph). Takeuchi 
discloses that amino acids Cys 604 and Cys 73 1 are disulfide bonded (see for example, at page 
1 1060, col. 1). Takeuchi discloses that its protease domain is disulfide bonded to the pro- 
domain region (see page 1 1055, column 2, third full paragraph and page 1 1058, col. 1 and page 
1 1060, col. 1, first paragraph) and that the pro-domain region remains bonded to the protease 
domain after activation (page 1 1058, lines 8-9). 

Takeuchi discloses that its "purified protease domain" includes the His-tag sequence 
and the pro-domain region bonded thereto, stating that a monoclonal antibody directed against 
the N-terminal Arg-Gly-Ser-His4 epitope is immunoreactive with its purified protein (see page 
1 1058). It is not an isolated single chain protease domain. It is a two chain structure and it 
includes amino acids in addition to the protease domain. Figure 3 cited by the Examiner as 
showing an isolated protease domain is a diagrammatic representation of the MTSPl protease 
domains; it by no means is an isolated protease domain. Furthermore, the figure depicts the 
disulfide bonds and does not show a fi-ee Cys in the protease domain, nor a fragment consisting 
of the protease domain. Page 1 1057, referenced by the Examiner as describing isolation of 
protease domain, does not do so. The polypeptide is expressed as a His-tagged polypeptide that 
forms a two-chain structure by virtue of the Cys-Cys disulfide bonds depicted in Figure 3. 
Furthermore, the paper discusses the activated His-tag extended polypeptide and describes its 
activity (see, e.g., Figure 6 and page 1 1057, col. 2). Takeuchi states that: 



the MT-SPl protease domain was expressed in E. coli as a His-tagged fusion and 
was purified from inclusion bodies under denaturing conditions by using metal- 
chelate affinity chromatography. . . . This denatured protein refolded when the 
urea was dialyzed from the protein. . . . N-terminal sequencing of the purified 
activated [i.e. the two-chain folded form] yielded the expected WGGT 



activation sequence. 

Thus, Takeuchi expresses a His-tagged form of the protein, which includes a protease domain 
and a pro-domain region, that forms a two chain structure when activation- cleaved. The 
sequenced molecule includes the His-tagged protease domain. Takeuchi does not disclose or 
contemplate an isolated polypeptide consisting of only the protease domain and does not 
mention replacement of any Cys with Ser (the Cys in its two-chain form is not fi-ee). 

Further, it is apparent from the disclosure that Takeuchi believes that a two-chain 
structure is a requisite for activity. Takeuchi discusses the need for activation cleavage and 
depicts the disulfide bond; there is no disclosure of a polypeptide in which there is a free Cys. 
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Hence, there is no disclosure for replacing any free Cys with another amino acid, sxich as a 
serine. There is no mention of replacement of any amino acids in its polypeptide. 

Hence Takeuchi does not disclose isolation of a polypeptide consisting only of the 
protease domain of any MTSP, including an MTSPl . Its polypeptide includes a His-tag 
sequence; the active form of the enzyme includes a disulfide bond between the protease 
domain and a pro-domain region. In addition, the only isolation of a polypeptide including the 
protease domain (which includes the His-tag), was for sequencing purposes. 

2. Analysis 



In maintaining the rejection, the Examiner states on page 18 of the Final Office Action 
(Exhibit 2) that: 

[t]he limitation "a free Cys in the protease domain is replaced with another amino acid" 
and "a free Cys in the protease domain is replaced with a serine" is a product-by-process 
type limitation. The end result of the products of the claims is a serine protease domain or 
a serine protease domain having a serine residue. Whether the product of the claimed 
protein is obtained by replacing a free cysteine residue or not, the product is still the same 
because the instant claims may be produced by the recited modification or not. Therefore, 
there is no [] structure implied by said limitations. Since the polypeptide of Takeuchi et 
al. consists of a protease domain of a MTSP and the MTSP protease domain has serine 
protease activity, the claims are anticipated by the prior art. Also, since the serine protease 
domain of Takeuchi et al. has a serine residue, claim 20 is also anticipated. 

Appellant respectfully disagrees. Claim 1 recites that the isolated substantially purified 
polypeptide consists only of a protease domain or a smaller catalytically active portion of the 
protease as a single chain, and that a free Cys residue of the serine protease domain is replaced 
with another amino acid . This is not a "product-by-process type" limitation as alleged by the 
Examiner, but a limitation on the molecular structure of the single chain polypeptide. 

A product-by-process claim is a product claim that defines the claimed product in terms 
of the process by which it is made. In re Luck, 476 F.2d 650, 177 USPQ 523 (CCPA 1973); hi 
re Pilkington, 41 1 F.2d 1345, 162 USPQ 145 (CCPA 1969); In re Steppan, 394 F.2d 1013, 156 
USPQ 143 (CCPA 1967). Appellant respectfiilly submits that the instant claims do not define 
the product in terms of the process by which it is made. The specification teaches that a single- 
chain form of a serine protease domain has a free Cys residue. For example, page 58, lines 12- 



Muteins of the MTSPl proteins are provided. In the activated double chain molecule, 
residue 731 forms a disulfide bond with the Cys at residue 604. In the single chain form, 
the residue at 731 in the protease domain is free. Muteins in which Cys residues, 
particularly the free Cys residue (amino acid 731 in SEQ ID No. 2) in the single chain 
protease domain [is replaced] are provided. Other muteins in which conservative amino 



Independent Claim 1 



20 recites: 
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acids replacements are effected and that retain proteolytic activity as a single chain are 
also provided. Such changes may be systematically introduced and tested for activity in in 
vitro assays, such as those provided herein. 

The Cys residue in the protease domain in the MTSP protein forms a disulfide bond with a Cys 
residue in pro-domain region, and autoactivation results in a polypeptide with a two-chain 
structure by virtue of the Cys-Cys disulfide bonds. Isolating the serine protease domain so that 
it is free from the pro-domain region results in unpaired Cys residues, because the single-chain 
isolated protease domain is not bonded to a Cys in another region of the protein, such as the 
pro-domain region. Hence, the isolated polypeptide consisting only of the protease domain 
will have a free Cys residue (a Cys residue that "does not form disulfide linkages with any 
other Cys residue in the protein," see page 10, lines 5-6 of the instant specification). Thus, the 
isolation of the protease domain results in a free Cys residue. Isolation of the protease domain 
does not result in a free Cys residue that is replaced with another amino acid. Further, the 
single chain form of the single chain protease domain can be made by recombinant expression 
in a vector, thus eliminating the need to "isolate" it from the expressed zymogen form of the 
enzyme. The isolated single chain form of the serine protease domain is not produced by 
replacing a free Cys residue with another amino acid . Hence, the claimed polypeptide is not 
defined in terms of the process by which it was made. Accordingly, the instant claims are not 
"product-by-process" claims. The polypeptides of Takeuchi et al, are two-chain polypeptides 
and do not contain a free Cys; hence they cannot contain a replaced free Cys. 

The limitation a free Cys residue of the serine protease domain is replaced with another 
amino acid is a structural limitation on the molecular architecture of the polypeptide. Cys 
residues readily form disulfide bonds due to the presence of the sulfliydryl group {e,g,, see 
Zubay, Biochemistry ((1983), pages 12-13, Exhibit 45). Other amino acid residues do not have 
this fiinctionality. For example, serine residues have a hydroxyl group instead of a sulfliydryl 
group and thus do not form disulfide bonds. Hence, replacing a free Cys residue in the 
protease domain of the polypeptide with another amino acid, such as a serine residue, as is 
claimed in claim 20, results in a protease domain that cannot form a disulfide bond with 
another region in the polypeptide. Hence, the recited limitation is a structural limitation. If the 
claims recited "wherein a sulfhydryl group is replaced with another fiinctionality" instead of 
"wherein a free Cys residue of the serine protease domain is replaced with another amino acid" 
there would be no question that the recitation is a structural limitation on the claimed 
compound. Because the recitation limits the structure of the polypeptide, the recited limitation 
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a free Cvs residue of the serine protease domain is replaced with another amino acid should be 
afforded patentable weight. "All words in a claim must be considered in judging the 
patentability of that claim against the prior art." In re Wilson, 424 F.2d 1382, 1385, 165 USPQ 
494, 496 (CCPA 1970). 

Appellant respectfully submits that Takeuchi does not disclose every element of the 
claimed subject matter. 

f 1) Free Cvs residue 

Takeuchi does not disclose a serine protease domain of an MTSP polypeptide that has a 
free Cys residue. Figure 3 of Takeuchi, for example, is a diagrammatic representation of the 
full-length MTSPl depicting the activated disulfide-bonded form of the enzyme, in which the 
Cys residue of the protease domain is part of a disulfide bond with a Cys residue in the pro- 
domain. Figure 4 of Takeuchi, which shows multiple sequence alignments of MTSPl 
structural motifs, identifies Cys residues that participate in disulfide bonds. All of the Cys 
residues in Figure 4 are shown as being disulfide bonded — there are no free Cys residues. 
Takeuchi discloses that its protease domain is disulfide bonded to the pro-domain region and 
remains bonded to the protease domain afl:er activation and thus Takeuchi does not disclose a 
protease domain having a free Cys residue. 

(2) Replacing a free Cvs residue with another amino acid 

There is no disclosure in Takeuchi with respect to replacement of any amino acid in its 
polypeptide. Takeuchi does not disclose replacing any amino acid in the serine protease 
domain with another amino acid. As discussed above, Takeuchi does not disclose a serine 
protease domain of an MTSP polypeptide that has a free Cys residue. Hence, Takeuchi does 
not disclose replacing a free Cys residue of the serine protease domain of an MTSP 
polypeptide with another amino acid. 

The Examiner's argument that "the serine protease domain of Takeuchi has a serine 
residue " and thus "claim 20 is also anticipated" is incorrect. Claim 20 does not recite a serine 
protease domain that has a serine residue. The claims recite that a free Cys residue of the 
serine protease domain of an MTSP polypeptide is replaced with another amino acid. There is 
no disclosure in Takeuchi of a protease domain of an MTSP polypeptide having a free Cys 
residue of the serine protease domain replaced with another amino acid. It is irrelevant 
whether other amino acid residues in the protease domain are serine residues. 
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3"^ An isolated^ substantially purified protease domain of an MTSP 
polypeptide 

Takeuchi discloses that its protease domain is disulfide bonded to the pro-domain 

region and remains bonded to the protease domain after activation. Takeuchi discloses that its 

"purified protease domain" includes the His-tag sequence, and states that a monoclonal 

antibody directed against the N-terminal Arg-Gly-Ser-His4 epitope is immunoreactive with its 

purified protein. Thus, the "purified protease domain" disclosed by Takeuchi includes 

additional amino acid residues in addition to the protease domain of the MTSPl . Neither page 

1 1057 nor Figure 3 of Takeuchi discloses a single chain polypeptide that consists only of the 

protease domain. As discussed above, the protease domain as expressed and isolated by 

Takeuchi includes additional amino acids. Takeuchi states that: 

N-terminal sequencing of the purified activated [i.e. the two-chain folded 
form] yielded the expected WGGT activation sequence. 

The purified activated polypeptide according to Takeuchi is a two chain polypeptide, and also, 
as expressed, includes the His-tag for purification. Figure 3, as noted, is a diagrammatic 
representation of the fiiU-length MTSPl depicting the activated disulfide-bonded form of the 
enzyme (in which the Cys that is replaced in the instant claims, is part of the disulfide bond). 
Hence, Takeuchi does not disclose a polypeptide consisting only of a protease domain or a 
smaller catalytically active portion of the protease domain. Thus, Takeuchi does not disclose 
an isolated, substantially purified protease domain of an MTSP polypeptide having a fi-ee Cys 
residue replaced with another amino acid. Hence, the disclosure of Takeuchi does not disclose 
every element of claim 1 . Therefore, Takeuchi does not anticipate claim 1 nor any claim 
dependent thereon. Accordingly, Appellant respectfially submits that the rejection of claim 1 as 
anticipated by Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

For the reasons above, Takeuchi does not anticipate any of the dependent claims and, in 
addition, additional reasons why Takeuchi does not anticipate each dependent claim are 
described below. 

Dependent Claim 11 

Claim 1 1 depends fi"om claim 1 and recites that the MTSP is selected firom among 
MTSPl, MTSP3, MTSP4 and MTSP6. Claim 1 1 includes every limitation of claim 1, fi-om 
which it depends. For the reasons discussed above with respect to claim 1, Takeuchi does not 
disclose every element of claim 1 1 and therefore does not anticipate claim 1 1. Accordingly, 
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Appellant respectfully submits that the rejection of claim 1 1 as anticipated by Takeuchi is 
erroneous in law and fact and, therefore, should be reversed. 



Claim 1 2 depends from claim 1 and recites that the MTSP protease domain consists of a 
sequence of amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2 
(MTSPl protease domain), amino acids 205-437 of SEQ ID NO. 4 (MTSP3), the amino acid 
residues set forth as SEQ ID No. 6 (MTSP4) or as amino acids 217-443 in SEQ ID No. 12 
(MTSP6), where the free Cys is replaced with Ser. Claim 12 includes every limitation of claim 
1, from which it depends. For the reasons discussed above with respect to claim 1, Takeuchi 
does not disclose every element of claim 12 and therefore does not anticipate claim 12. 
Accordingly, Appellant respectfully submits that the rejection of claim 12 as anticipated by 
Takeuchi is erroneous in law and fact and, therefore, should be reversed. 



Claim 1 3 depends from claim 1 and recites that the substantially purified polypeptide 
has at least about 95% sequence identity with a protease domain consisting of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217- 
443 in SEQ ID No. 12. Claim 13 includes every limitation of claim 1, from which it depends. 
For the reasons discussed above with respect to claim 1, Takeuchi does not disclose every 
element of claim 13 and therefore does not anticipate claim 13. Accordingly, Appellant 
respectfully submits that the rejection of claim 13 as anticipated by Takeuchi is erroneous in 
law and fact and, therefore, should be reversed. 



Claim 20 depends from claim 1 and recites that a free Cys in the protease domain is 
replaced with a serine. Claim 20 includes every limitation of claim 1, from which it depends. 
As discussed above, Takeuchi does not disclose a serine protease domain of an MTSP 
polypeptide that has a free Cys residue. There is no disclosure in Takeuchi with respect to 
replacement of any amino acids in its polypeptide. Takeuchi does not disclose replacing any 
amino acid in the serine protease domain with another amino acid. Takeuchi does not disclose 
replacing a free Cys residue of the serine protease domain of an MTSP polypeptide with a 
serine. Thus, for these reasons and the reasons discussed above with respect to claim 1 , 
Takeuchi does not disclose every element of claim 20 and therefore does not anticipate claim 



Dependent Claim 12 



Dependent Claim 13 



Dependent Claim 20 
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20. Accordingly, Appellant respectfully submits that the rejection of claim 20 as anticipated by 
Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 34 

Claim 34 depends from claim 1 and recites that the MTSP is selected from among corin, 
MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), TMPRSS2, and 
TMPRSS4. Claim 34 includes every limitation of claim 1, from which it depends. Thus, for the 
reasons discussed above with respect to claim 1, Takeuchi does not disclose every element of 
claim 34 and therefore does not anticipate claim 34. Accordingly, Appellant respectfiilly 
submits that the rejection of claim 34 as anticipated by Takeuchi is erroneous in law and fact 
and, therefore, should be reversed. 

Dependent Claim 40 

Claim 40 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. Takeuchi does not disclose an isolated single- 
chained polypeptide consisting only of an MTSP protease domain in which a free Cys has 
been replaced with another amino acid nor conjugating two or more such isolated protease 
domains to a solid support. Hence, there is no disclosure in Takeuchi of a solid support that 
includes two or more isolated single-chained polypeptides consisting only of an MTSP 
protease domain in which a free Cys was replaced with another amino acid. Thus, for these 
reasons and the reasons discussed above with respect to claim 1 , Takeuchi does not disclose 
every element of claim 40 and therefore does not anticipate claim 40. Accordingly, Appellant 
respectfully submits that the rejection of claim 40 as anticipated by Takeuchi is erroneous in 
law and fact and, therefore, should be reversed. 

Dependent Claim 41 

Claim 41 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker where the polypeptides comprise an array. The 
specification defines an array as a collection of elements containing three or more members. 
As discussed above, Takeuchi does not disclose isolating the protease domain and preparing 
it as a single chain and modifying the single-chain polypeptide that has a free Cys residue by 
replacing the free Cys residue with another amino acid. Takeuchi does not disclose a solid 
support that includes three or more isolated single-chained polypeptides consisting only of an 
MTSP protease domain in which a free Cys was replaced with another amino acid. Thus, for 
these reasons and the reasons discussed above with respect to claim 1 , Takeuchi does not 
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disclose every element of claim 41 and therefore does not anticipate claim 41. Accordingly, 
Appellant respectfully submits that the rejection of claim 41 as anticipated by Takeuchi is 
erroneous in law and fact and, therefore, should be reversed. 



Claim 42 depends from claim 41 and recites that the array comprises polypeptides 
having different MTSP protease domains. As discussed above, Takeuchi does not disclose 
isolating the protease domain and preparing it as a single chain nor replacing any amino acid 
in the MTSP polypeptide with another amino acid, Takeuchi does not disclose modifying a 
single-chain polypeptide that has a free Cys residue by replacing the free Cys residue with 
another amino acid. Takeuchi does not disclose a solid support that includes three or more 
isolated single-chained polypeptides consisting only of an MTSP protease domain in which a 
free Cys was replaced with another amino acid. Takeuchi does not disclose a solid support 
that includes three or more isolated protease domains in which a free Cys was replaced with 
another amino acid, where the protease domains are from different MTSPs. Thus, for these 
reasons and the reasons discussed above with respect to claim 1, Takeuchi does not disclose 
every element of claim 42 and therefore does not anticipate claim 42. Accordingly, 
Appellant respectfiilly submits that the rejection of claim 42 as anticipated by Takeuchi is 
erroneous in law and fact and, therefore, should be reversed. 



Claim 113 recites a solid support comprising two or more polypeptides of claim 12 
linked thereto either directly or via a linker. Claim 1 2 depends from claim 1 and specifies 
that the MTSP protease domain consists of a sequence of amino acid residues selected from 
among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the 
amino acid residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 
Claim 12 includes every limitation of claim 1, from which it depends. 

Takeuchi does not disclose isolating the protease domain and preparing it as a single 
chain. Takeuchi does not disclose replacing any amino acid in the MTSP polypeptide with 
another amino acid, and does not disclose modifying a single-chain polypeptide that has a free 
Cys residue by replacing the free Cys residue with another amino acid. There is no disclosure 
in Takeuchi of a solid support that includes two or more isolated single-chained polypeptides 
consisting only of an MTSP protease domain in which a free Cys was replaced with another 
amino acid. Thus, for these reasons and the reasons discussed above with respect to claim 1 



Dependent Claim 42 



Dependent Claim 113 
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and claim 12, Takeuchi does not disclose every element of claim 113 and therefore does not 
anticipate claim 113. Accordingly, Appellant respectfully submits that the rejection of claim 
1 13 as anticipated by Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 114 

Claim 1 14 depends from claim 113 and recites that the polypeptides comprise an 
array. As discussed above, Takeuchi does not disclose isolating the protease domain and 
preparing it as a single chain. Takeuchi does not disclose replacing any amino acid in the 
MTSP polypeptide with another amino acid, and does not disclose modifying a single-chain 
polypeptide that has a free Cys residue by replacing the free Cys residue with another amino 
acid. There is no disclosure in Takeuchi of a solid support that includes three or more 
isolated single-chained polypeptides consisting only of an MTSP protease domain in which a 
free Cys was replaced with another amino acid. Thus, for these reasons and the reasons 
discussed above with respect to claim 1 and claim 113, Takeuchi does not disclose every 
element of claim 114 and therefore does not anticipate claim 1 14. Accordingly, Appellant 
respectfully submits that the rejection of claim 1 14 as anticipated by Takeuchi is erroneous in 
law and fact and, therefore, should be reversed. 

Summary 

Appellant respectfully submits that, in light of the above, the Examiner has failed to 
establish claims 1,11-13, 20, 34-36, 40-42, 1 13 and 1 14 as anticipated by Takeuchi under 35 
U.S.C. §102(b). Accordingly, Appellant respectfully submits that the rejection of claims 1,11 
13, 20, 34-36, 40-42, 113 and 1 14 as anticipated by Takeuchi is erroneous in law and fact and, 
therefore, should be reversed. 

. THE REJECTION OF CLAIMS 1, 11-13 AND 34 UNDER 35 U.S.C. §102(e)/103(a) 

In the Final Office Action (Exhibit 1), on page 19, claims 1, 11-13 and 34 are rejected 
as obvious under 35 U.S.C. §103(a)over O'Brien and there is no mention of a rejection under 
35 U.S.C. § 102(e), although the rejection is set forth under the heading "Claim Rejections - 
35 use §102/103." In the paragraph bridging pages 20 and 21 of the Final Office Action, 
however, the Examiner states that the claims are anticipated by O'Brien. Accordingly, 
Appellant separately traverses the rejection of claims 1, 1 1-13 and 34 under 35 U.S.C. 
§ 102(e) as anticipated by O'Brien and the rejection of claims 1, 11-13 and 34 as obvious 
under 35 U.S.C. §103(a)over O'Brien. 
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The 102(e) rejection 

The Examiner alleges that the limitation " a free Cvs residue of the serine protease 
domain is replaced with another amino acid " is a "product-by-process type" limitation, and that 
"whether the product is obtained by replacing a free cysteine residue or not, the product is still 
the same because the instant claims may be produced by the recited modification or not" and 
concludes that "there is no structure implied by said limitations. The Final Office Action 
concludes that the disclosed molecules in O'Brien anticipate the claimed subject matter. 

A. LEGAL STANDARDS - ANTICIPATION UNDER 35 U.S.C. § 102(b) 

The law with respect to anticipation under 35 U.S.C. § 102(a) is discussed above. 

B. THE REJECTION OF CLAIMS 1. 11-13 AND 34 UNDER 35 U.S.C. S102(b) 
SHOULD BE REVERSED BECAUSE O^BRIEN DOES NOT ANTICIPATE 
THE CLAIMED SUBJECT MATTER 

1. The disclosure of O'Brien 

O'Brien discloses a protein identified therein as TADG-15, which is an MTSPl variant, 
with a sequence of amino acids as set forth as SEQ ID NO:2. The reference also discloses a 
comparison of the amino acid sequence of the protease domain of TADG-15 (SEQ ID NO: 14) 
with other serine protease catalytic domains (see Figure 2). O'Brien discloses that TADG-15 is 
a highly over-expressed gene in tumors and suggests that TADG-15 is novel in its component 
structure of domains because it has a protease catalytic domain that could be released in vivo 
and used as a diagnostic in vivo and that potentially could be a target for therapeutic 

intervention (col. 15, lines 31-38): 

TADG-15 is a highly overexpressed gene in tumors. It is expressed in a 
limited number of normal tissues, primarily tissues that are involved in either 
uptake or secretion of molecules e.g. colon and pancreas. TADG-15 is further 
novel in its component structure of domains in that it has a protease catalytic 
domain which could be released and used as a diagnostic and which has the 
potential for a target for therapeutic intervention. 

Thus, O'Brien states that the TADG-15 protease domain possibly could be released in vivo 
and serve as a therapeutic target, not as a therapeutic. O'Brien does not disclose, teach or 
suggest or mention or even hint at isolating the protease domain nor provide any disclosure 
that isolation of a protease domain would result in a fi"ee Cys that should be replaced. 

O'Brien does not disclose isolation of the protease domain as a single-chain 
polypeptide that consists only of the protease domain as a single chain. O'Brien does not 
disclose a protease domain of an MTSP polypeptide that has a fi*ee Cys residue, or replacing 
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a free Cys residue of a serine protease domain of an MTSP polypeptide with another amino 
acid. 

2. ANALYSIS 

Independent Claim 1 

Claim 1 recites that the isolated substantially purified polypeptide consists of a 
protease domain or a smaller catalytically active portion of the protease as a single chain, and 
that a free Cys residue of the serine protease domain is replaced with another amino acid. 
O'Brien does not disclose an isolated polypeptide that consists only of a protease domain or a 
smaller catalytically active portion of the protease as a single chain. O'Brien does not 
disclose an isolated single-chain protease domain of an MTSP polypeptide having a free Cys 
residue, or replacing a free Cys residue of an isolated single-chain serine protease domain of 
an MTSP polypeptide with another amino acid. In the previous Office Action, mailed April 
21, 2006 (Exhibit 46, at page 20, lines 6-7), the Examiner states that O'Brien does not 
disclose a protease domain that has been purified . Hence, O'Brien does not disclose every 
element of claim 1 . 

In addition, as discussed above, O'Brien does not disclose an isolated protease 
domain of an MTSP. Stating that such protease domain could be released in vivo and used as 
a diagnostic target does not constitute a disclosure of an isolated single chain protease 
domain, and certainly does not constitute disclosure of an isolated protease domain in which 
a free Cys is replaced. 

In maintaining the rejection, the Examiner states on page 20 of the Final Office Action 
(Exhibit 1 ) that 

[t]he limitation "a free Cys in the protease domain is replaced with another amino acid" is 
a product-by-process type limitation. The end result of the products of the claims is a 
serine protease domain. Whether the product of the claimed protein is obtained by 
replacing a free cysteine residue or not, the product is still the same because the instant 
claims may be produced by the recited modification or not. Therefore, there is no there a 
structure implied by said limitations. Since the polypeptide of O'Brien ei al. consists of a 
protease domain of a MTSP and the MTSP protease domain has serine protease activity, 
the claims are anticipated by the prior art. 

Appellant respectfially submits that a free Cys residue of the serine protease domain is 
replaced with another amino acid is not a "product-by-process type" limitation as alleged by 
the Examiner, but a limitation on the molecular structure of the single chain polypeptide. A 
product-by-process claim is a product claim that defines the claimed product in terms of the 
process by which it is made. Appellant respectfully submits that the instant claims do not 
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define the product in terms of the process by which it is made. As taught in the specification 
{e,g,, see page 58, Unes 12-20, which is reproduced above in the traverse of the rejection over 
Takeuchi), the Cys residue in the protease domain in the MTSP protein forms a disulfide bond 
with a Cys residue in pro-domain region, and autoactivation results in a polypeptide with a 
two-chain structure by virtue of the Cys— Cys disulfide bonds. Isolating the serine protease 
domain so that it is fi-ee fi-om the pro-domain region results in unpaired Cys residues, because 
the Cys residue in the protease domain of the single-chain isolated protease domain is not 
bonded to a Cys in another region of the protein, such as the pro-domain region. Thus, the 
isolated polypeptide consisting only of the protease domain will have a fi"ee Cys residue (a Cys 
residue that "does not form disulfide linkages with any other Cys residue in the protein," see 
page 10, lines 5-6 of the instant specification). Thus, the isolation of the protease domain 
results in a firee Cys residue. Isolation of the protease domain does not result in a fi*ee Cys 
residue being replaced with another amino acid. Further, the single chain form of the single 
chain protease domain can be made by recombinant expression in a vector, thus eliminating the 
need to "isolate" it fi"om the expressed zymogen form of the enzyme. The isolated single chain 
form of the serine protease domain is not produced bv replacing a fi'ee Cvs residue . Hence, the 
claimed polypeptide is not defined in terms of the process by which it was made. Accordingly, 
the instant claims are not "product-by-process" claims. 

The limitation a free Cvs residue of the serine protease domain is replaced with another 
amino acid is a structural limitation on the molecular architecture of the polypeptide. Cys 
residues readily form disulfide bonds due to the presence of the sulfhydryl group (e.g., see 
Zubay, Biochemistry ((1983), pages 12-13, Exhibit 45). Other amino acid residues do not have 
this functionality. For example, serine residues have a hydroxyl group instead of a sulfhydryl 
group and thus do not form disulfide bonds. Hence, replacing a free Cys residue in the 
protease domain of the polypeptide with another amino acid, such as a Ser residue, as is 
claimed in claim 20, results in a protease domain that cannot form a disulfide bond with 
another region in the polypeptide. Hence, the recited limitation is a structural limitation. 
Because the recitation limits the structure of the polypeptide, the recitation should be afforded 
patentable weight. "All words in a claim must be considered in judging the patentability of 



that claim against the prior art." In re Wilson, 424 F.2d 1382, 1385, 165 USPQ 494, 496 
(CCPA 1970). 
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Hence, O'Brien does not disclose every element of claim 1 . Therefore O'Brien does 
not anticipate claim 1 nor any claim dependent thereon. Accordingly, Appellant respectfully 
submits that the rejection of claim 1 as anticipated by O'Brien is erroneous in law and fact 
and, therefore, should be reversed. 

For the reasons above, O'Brien does not anticipate any of the dependent claims and, 
further, additional reasons why O'Brien does not anticipate each dependent claim are described 
below. 



Claim 1 1 depends from claim 1 and specifies that the MTSP is selected from among 
MTSPl, MTSP3, MTSP4 and MTSP6. Claim 1 1 includes every limitation of claim 1, from 
which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien 
does not disclose every element of claim 1 1 and therefore does not anticipate claim 1 1 . 
Accordingly, Appellant respectfully submits that the rejection of claim 1 1 as anticipated by 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 



Claim 12 depends from claim 1 and specifies that the MTSP protease domain consists 
of a sequence of amino acid residues selected from among amino acids 615-855 of SEQ ID No. 
2, amino acids 205-437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or 
as amino acids 217-443 in SEQ ID No. 12. Claim 12 includes every limitation of claim 1, fi-om 
which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien does 
not disclose every element of claim 12 and therefore does not anticipate claim 12, 
Accordingly, Appellant respectfully submits that the rejection of claim 12 as anticipated by 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 



Claim 1 3 depends from claim 1 and specifies that the substantially purified polypeptide 
has at least about 95% sequence identity with a protease domain consisting of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217- 
443 in SEQ ID No. 12. Claim 13 includes every limitation of claim 1, from which it depends. 
Thus, for the reasons discussed above with respect to claim 1, O'Brien does not disclose every 
element of claim 13 and therefore does not anticipate claim 13. Accordingly, Appellant 



Dependent Claim 11 



Dependent Claim 12 



Dependent Claim 13 
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respectfully submits that the rejection of claim 1 as anticipated by O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 



Claim 34 depends from claim 1 and specifies that the MTSP is selected from among 
corin, MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), TMPRSS2, and 
TMPRSS4. Claim 34 includes every limitation of claim 1, from which it depends. Thus, for the 
reasons discussed above with respect to claim 1, O'Brien does not disclose every element of 
claim 34 and therefore does not anticipate claim 34. Accordingly, Appellant respectfully 
submits that the rejection of claim 1 as anticipated by O'Brien is erroneous in law and fact and, 
therefore, should be reversed. 



Appellant respectfully submits that, in light of the above, the Examiner has failed to 
establish claims 1, 11-13 and 34 as anticipated under 35 U.S.C. §102(b) by O'Brien. 
Accordingly, Appellant respectfiilly submits that the rejection of claims 1,11-13 and 34 as 
anticipated by O'Brien is erroneous in law and fact and, therefore, should be reversed. 

5. THE REJECTION OF CLAIMS 1, 11-13 AND 34 AND CLAIMS 35, 36, 40-42, 113 
AND 114 UNDER 35 U.S.C. §103(a) - O'Brien 

Claims 1, 1 1-13 and 34, as well as claims 35, 36, 40-42, 113 and 1 14, are rejected as 
unpatenable over O'Brien under 35 U.S.C. § 103(a) because O'Brien allegedly teaches a 
method of expressing polypeptides in host cells and that it teaches that the protease domain 
could be released from the polypeptide and used as a diagnostic that has the potential for 
therapeutic intervention. Thus, the Final Office Action concludes that it would have been 
obvious to one of skill in the art to express the protease domain disclosed as SEQ ID NO: 14 
by O'Brien and purify the polypeptide. It is alleged that the motivation to make such 
polypeptides is the disclosed use as a diagnostic for therapeutic intervention. Further, it is 
alleged that one of ordinary skill in the art would have had a reasonable expectation of 
success since the expression of heterologous polypeptides was routine in the art and O'Brien 
teaches how to express heterologous polypeptides. The Examiner also alleges that the 
limitation " a free Cys residue of the serine protease domain is replaced with another amino 
acid" is a "product-by-process type" limitation, and that "whether the product is obtained by 
replacing a free cysteine residue or not, the product is still the same because the instant 
claims may be produced by the recited modification or not" and concludes that "there is no 
structure implied by said limitations. 



Dependent Claim 34 



Summary 
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The rejection respectfully is traversed. As discussed above, O'Brien et al speculates 
that the protease domain of TAG- 1 5 could be released in vivo and, if it turns out that it is 
released in vivo, the protease domain could serve as therapeutic target. This is not a teaching 
or suggestion or even hint for producing the protease domain in vitro and using it as a 
therapeutic (not a target) or as a diagnostic reagent )not as a target. There is nothing taught 
or suggested in O'Brien et al w^ould have led one of ordinary skill in the art to isolate the 
protease domain (or a catalytically active fragment there) and replace what ends up as a free 
Cys with another amino acid. 

A. LEGAL STANDARDS - OBVIOUSNESS UNDER 35 U.S,C. § 103(a) 

For prima facie obviousness of claimed subject matter to be established under 35 U.S.C. 
§103, all the claim limitations must be taught or suggested by the prior art. In re Royka, 490 F.2d 
981, 180 USPQ 580 (CCPA 1974). This principle of U.S. law regarding obviousness was not 
altered by the recent Supreme Court holding in KSR Intemational Co. v. Teleflex Inc., 127 S.Ct. 
1727, 82 USPQ2d 1385 (2007). In KSR, the Supreme Court stated that "Section 103 forbids 
issuance of a patent when 'the differences between the subject matter sought to be patented and 
the prior art are such the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter 
pertains.'" KSR Int'l Co. v. Teleflex Inc., 127 S.Ct. 1727, 1734, 82 USPQ2d 1385, 1391 (2007). 

The mere fact that prior art may be modified to produce the claimed product does not 
make the modification obvious unless the prior art suggests the desirability of the 
modification. In re Fritch, 23 U.S.P.Q.2d 1780 (Fed. Cir. 1992); see, also, In re Papesch, 315 
F.2d 381, 137 U.S.P.Q. 43 (CCPA 1963). Further, that which is within the capabilities of one 
skilled in the art is not synonymous with that which is obvious. Ex parte Gerlach, 212 USPQ 
471 (Bd. APP. 1980). 

Furthermore, the Supreme Court in KSR took the opportunity to reiterate a second 
long-standing principle of U.S. law: that a holding of obviousness requires the fact finder 
(here, the Examiner), to make explicit the analysis supporting a rejection under 35 U.S.C. 103, 
stating that "rejections on obviousness cannot be sustained by mere conclusory statements; 
instead, there must be some articulated reasoning with some rational underpinning to support 
the legal conclusion of obviousness. Id. at 1740-41, 82 USPQ2d at 1396 (citing In re Kahn, 
441 F.3d 977, 988, 78 USPQ2d 1329, 1336 (Fed. Cir. 2006)). 
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While the KSR Court rejected a rigid application of the teaching, suggestion, or 

motivation ("TSM") test in an obviousness inquiry, the Court acknowledged the importance 

of identifying "a reason that would have prompted a person of ordinary skill in the relevant 

field to combine the elements in the way the claimed new invention does" in an obviousness 

determination. KSR, 127 S. Ct. at 1731, The court stated in dicta that, where there is a 

"market pressure to solve a problem and there are a finite number of 
identified, predictable solutions, a person of ordinary skill has good reason to 
pursue the known options within his or her technical grasp. If this leads to the 
anticipated success, it is likely the product not of innovation but of ordinary 
skill and common sense. In that instance the fact that a combination was 
obvious to try might show that it was obvious under § 103." 

In apost-KSR decision, PharmaStem Therapeutics. Inc. v. ViaCell. Inc., 491 F.3d 

1342 (Fed. Cir. 2007), the Federal Circuit stated that: 

an invention would not be invalid for obviousness if the inventor would have 
been motivated to vary all parameters or try each of numerous possible 
choices until one possibly arrived at a successful result, where the prior art 
gave either no indication of which parameters were critical or no direction as 
to which of many possible choices is likely to be successful. Likewise, an 
invention would not be deemed obvious if all that was suggested was to 
explore a new technology or general approach that seemed to be a promising 
field of experimentation, where the prior art gave only general guidance as to 
the particular form of the claimed invention or how to achieve it.. 

Furthermore, KSR has not overruled existing case law. See In re Papesch, (315 F.2d 

381, 137 USPQ 43 (CCPA 1963)), In re Dillon, 919 F.2d 688, 16 USPQ2d 1897 (Fed. Cir. 

1991), and In re Deuel (51 F.3d 1552, 1558-59, 34 USPQ2d 1210, 1215 (Fed. Cir. 1995)). "In 

cases involving new compounds, it remains necessary to identify some reason that would have 

led a chemist to modify a known compound in a particular manner to establish prima facie 

obviousness of a new claimed compound." Takeda v. Alphapharm, 492 F.3d 1350 (Fed. Cir. 

2007). 

The mere fact that prior art may be modified to produce what is claimed does not 
make the modification obvious unless the prior art suggests the desirability of the 
modification, hi re Fritch, 23 U.S.P.Q.2d 1780 (Fed. Cir. 1992); see, also. In re Papesch, 315 
F.2d 381, 137 U.S.P.Q. 43 (CCPA 1963). hi addition, if the proposed modification or 
combination of the prior art would change the principle of operation of the prior art invention 
being modified, then the teachings of the references are not sufficient to render the claims 
prima facie obvious. In re Ratti, 270 F.2d 810, 123 USPQ 349 (CCPA 1959). 
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The disclosure of the appUcant cannot be used to hunt through the prior art for the 
claimed elements and then combine them as claimed. In re Laskowski, 871 F.2d 115, 117, 10 
USPQ2d 1397, 1398 (Fed. Cir. 1989). "To imbue one of ordinary skill in the art with 
knowledge of the invention in suit, when no prior art reference or references of record convey 
or suggest that knowledge, is to fall victim to the insidious effect of a hindsight syndrome 
wherein that which only the inventor taught is used against its teacher" W,L. Gore & 
Associates, Inc. v. Garlock Inc., 721 F.2d 1540, 1553, 220 USPQ 303, 312-13 (Fed. Cir, 1983). 

B. THE REJECTION OF CLAIMS 1. 11-13, 34-36. 40-42, 113 AND 113 UNDER 35 
U>S.C. S103fb) SHOULD BE REVERSED BECAUSE THE EXAMINER HAS 
FAILED TO ESTABLISH A PRIMA FACIE CASE OF OBVIOUSNESS 

1. The teachings of O'Brien 

The teachings of O'Brien are discussed above. O'Brien states that: 

TADG-15 is a highly overexpressed gene in tumors. It is expressed in a 
limited number of normal tissues, primeirily tissues that are involved in either 
uptake or secretion of molecules e.g. colon and pancreas. TADG-15 is further 
novel in its component structure of domains in that it has a protease catalytic 
domain which could be released and used as a diagnostic and which has the 
potential for a target for therapeutic intervention. 

O'Brien is speculating that the protease domain could be released in vivo and serve as a 
therapeutic target not as a therapeutic agent or diagnostic reagent, O'Brien does not teach 
or suggest that the protease domain exists even in vivo as a single chain, and does not teach or 
suggest isolating it. In this passage, noted by the Examiner, O'Brien is discussing the 
expression of TADG-15 in tumors and other tissues and indicates that it is expressed on the 
surface of cells. Because of its structure, the protease domain could be presented on the 
surface of cells in vivo, and, thus, "could be released." Since it is over expressed in tumors, if 
released in v/vo, it could serve as a diagnostic marker indicating the presence of tumor cells. 
Use of its presence in vivo as a diagnostic marker for detection of tumors and/or as a 
therapeutic target is not a teaching or suggestion or hint for isolating the protease domain, nor 
for producing it as a single-chain polypeptide, nor for modifying it by replacing what would be 
a free Cys in a single chain form with another amino acid. 

Thus, O'Brien does not state or hint that the isolated single chain protease domain 
could be used as therapeutic or as a diagnostic, and certainly does not teach or suggest then 
modifying it by replacing a free Cys in the single chain polypeptide with another amino acid. 
Such teaching does not constitute even a hint or suggestion for isolation or production of a 
polypeptide consisting only of the single-chain protease domain of an MTSP, nor of a single 
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chain protease domain in which the free Cys (which results only by virtue of it being a single 

chain) is replaced with another amino acid. 

2. Analysis - the Examiner has failed to set forth a case of prima 
facie obviousness. 

Independent Claim 1 

O'Brien does not teach or suggest an isolated single chain protease domain of an 
MTSP polypeptide nor one in which a free Cys residue is replaced with another amino acid, 
such as a serine. There is no teaching or suggestion in O'Brien for preparing a polypeptide 
consisting only of a single-chain protease domain and modifying by replacing what is a free 
Cys in the single-chain form with another amino acid. The Examiner acknowledges that 
O'Brien does not teach a protease domain of an MTSP polypeptide where a free Cys residue 
in the protease domain is replaced with Ser residues. See, for example, the non-final Office 
Action, mailed June 25, 2007 (Exhibit 1), at page 25, which recites: 

The reference O'Brien et al, does not teach a serine protease domain of a MTPSP [sic] 
polypeptides wherein free Cys residues have been replaced with Ser residues. 

Even post-KSR, "it remains necessary to identify some reason that would have led a chemist 
to modify a known compound in a particular manner to establish prima facie obviousness of a 
new claimed compound." Takeda Chem. Indus., Ltd. v. Alphapharm Pty., Ltd. (Fed. Cir. 
2007). 

In this instance, there is no teaching or suggestion in O'Brien for isolating a single 
chain polypeptide consisting only of an MTSP protease domain in which a free Cys is 
replaced with another amino acid. O'Brien provides no teaching or suggestion for isolating 
the protease domain and preparing it as a single chain. O'Brien does not teach or suggest 
replacing any amino acid in the MTSP polypeptide with another amino acid, and provides no 
teaching or suggestion for modifying a single-chain polypeptide having a free Cys residue by 
replacing the free Cys residue with another amino acid. 

For at least the reasons discussed above, O'Brien, alone or in combination with what 
was known in the art, does not teach or suggest every element of independent claim 1 , 
Accordingly, Appellant respectfully submits that claim 1 is not taught or suggested by 
O'Brien, Thus, the Examiner has failed to set forth a prima facie case of obviousness of 
claim 1 . Appellant respectfully submits that the rejection of claim 1 as obvious over O'Brien 
is erroneous in law and fact and, therefore, should be reversed. 
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For the reasons above, O'Brien fails to set forth a prima facie case of obvious of any of 
the dependent claims and further, additional reasons why O'Brien fails to set forth a prima 
facie case of obvious of each dependent claim are described below. 

Dependent Claim 11 

Claim 1 1 depends from claim 1 and specifies that the MTSP is selected from among 
MTSPl, MTSP3, MTSP4 and MTSP6. Claim 1 1 includes every limitation of claim 1, from 
which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien, 
alone or in combination with what was known in the art, does not teach or suggest every 
element of claim 1 1 . Accordingly, Appellant respectfully submits that claim 11 is not taught 
or suggested by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 1 1 . Appellant respectfiilly submits that the rejection of claim 1 1 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 12 

Claim 12 depends from claim 1 and specifies that the MTSP protease domain consists 
of a sequence of amino acid residues selected from among amino acids 615-855 of SEQ ID No. 
2 (MTSPl), amino acids 205-437 of SEQ ID NO. 4 (MTSP3), the amino acid residues set forth 
as SEQ ID No. 6 (MTSP4) or as amino acids 217-443 in SEQ ID No. 12 (MTSP6), where the 
free Cys is replaced with another amino acid. Claim 12 includes every limitation of claim 1, 
from which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien, 
alone or in combination with what was known in the art, does not teach or suggest every 
element of claim 12. Accordingly, Appellant respectfially submits that claim 12 is not taught or 
suggested by O'Brien.. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 12. Appellant respectfiilly submits that the rejection of claim 12 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 13 

Claim 1 3 depends from claim 1 and specifies that the substantially purified polypeptide 
has at least about 95% sequence identity with a protease domain consisting of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217- 
443 in SEQ ID No. 12. Claim 13 includes every limitation of claim 1, from which it depends. 
Thus, for the reasons discussed above with respect to claim 1, O'Brien, alone or in combination 
with what was known in the art, does not teach or suggest every element of claim 13. Hence, 
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Appellant respectfully submits that claim is not taught or suggested by O'Brien. Thus, the 
Examiner has failed to set forth a prima facie case of obviousness of claim 13. Appellant 
respectfully submits that the rejection of claim 13 as obvious over O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 
Dependent Claim 34 

Claim 34 depends from claim 1 and specifies that the MTSP is selected from among 
corin, MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), TMPRSS2, and 
TMPRSS4. Claim 34 includes every limitation of claim 1 , from which it depends. Thus, for the 
reasons discussed above with respect to claim 1, O'Brien, alone or in combination with what was 
known in the art, does not teach or suggest every element of claim 34. Accordingly, Appellant 
respectfully submits that claim 34 is not taught or suggested by O'Brien. Thus, the Examiner 
has failed to set forth a prima facie case of obviousness of claim 34. Appellant respectfully 
submits that the rejection of claim 34 as obvious over O'Brien is erroneous in law and fact and, 
therefore, should be reversed. 

Dependent Claim 35 

Claim 35 is directed to a conjugate that comprises a) a polypeptide of claim 1 and b) 
a targeting agent linked to the protein directly or via a linker, wherein the conjugate has 
serine protease activity. The specification defines a targeting agent as 

any moiety, such as a protein or effective portion thereof, that provides specific binding 
of the conjugate to a cell surface receptor, which, preferably, internalizes the conjugate or 
MTSP portion thereof. A targeting agent may also be one that promotes or facilitates, for 
example, affinity isolation or purification of the conjugate; attachment of the conjugate to 
a surface; or detection of the conjugate or complexes containing the conjugate. 

(e.g., see page 38, lines 9-15), 

Claim 35 recites that a targeting agent is linked to the protein of claim 1 directly or 

via a linker and that the conjugate has serine protease activity. There is no teaching or 

suggestion in O'Brien of conjugating a targeting agent to an isolated single-chain polypeptide 

consisting only of an MTSP protease domain in which a free Cys was replaced with another 

amino acid. 

O'Brien teaches, at col. 9, lines 53-56, covalently linking another polypeptide to an 
intact TADG-15 polypeptide or to a fragment thereof. The cited section states: 

The fragment, or the intact TAGD-15 polypeptide, may be covalently linked to another 
polypeptide, e.g., which acts as a label, a ligand, or a means to increase 
antigenicity, [emphasis added] 
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By "fragment" O'Brien mean "antigenic fragment" or other fragment (see, col. 9, lines, 22- 
32), which describe fragments as 10 residues, typically 20 residues and "preferably at least 30 
(e.g 50) residues" in length, and indicates that they can be antigenic fragments for preparing 
antibodies. From the context, O'Brien contemplates antigenic fragments. There is no 
mention, teaching suggestion or hint that the fragment is a catalytic domain or fragment 
thereof. . 

O'Brien does not teach or suggest isolating the protease domain of TADG-15 and 
conjugating it to another polypeptide. The Examiner alleges that the motivation for making 
conjugates is to use it as a diagnostic, which has the potential for a target for therapeutic 
intervention (page 23 of the Office Action). Even if there were such suggestion in O'Brien, 
as noted above, there is no teaching or suggestion for isolating the protease domain or a 
catalytically active portion thereof and replacing a free Cys residue. Hence there can be no 
motivation to prepare conjugates. Furthermore, as discussed above, O'Brien suggests 
isolating antigenic fragments, and linking them to another polypeptide, such as a label, ligand 
or as means to increase antigenicity. O'Brien contemplates using antigenic fragments to 
make antibodies because the TAGD-15 polypeptide is considered a possible therapeutic 
target, not as a therapeutic agent or as a diagnotic agent. 

O'Brien teaches that TADG-15 is a highly over-expressed gene in tumors and 
suggests that TADG-15 thus could be a potential target for therapeutic intervention (col. 15, 
lines 31-38). One of ordinary skill in the art would not be lead to conjugate a targeting 
moiety to a target . O'Brien does not teach, suggest or mention conjugating a targeting agent 
to an isolated protease domain. Accordingly, for these reasons and the reasons discussed 
above with respect to claim 1, Appellant respectfully submits that claim 35 is not taught or 
suggested by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 35. Appellant respectfully submits that the rejection of claim 35 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 36 

Claim 36 depends from claim 35 and recites that the targeting agent permits i) 
affinity isolation or purification of the conjugate; ii) attachment of the conjugate to a surface; 
iii) detection of the conjugate; or iv) targeted delivery to a selected tissue or cell. As 
discussed above, O'Brien does not teach or suggest isolating the protease domain of TADG- 
15, replacing a free Cys with another amino acid and conjugating the single chain protease 
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domain to a targeting agent. Accordingly, for these reasons and the reasons discussed above 
with respect to claim 1, Appellant respectfully submits that claim 36 not taught or suggested 
by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of obviousness of 
claim 36. Appellant respectfully submits that the rejection of claim 36 as obvious over 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 40 

Claim 40 is directed to a solid support comprising two or more polypeptides of claim 
1 linked thereto either directly or via a linker. O'Brien does not mention a solid support. 
There is no teaching or suggestion in O'Brien of a solid support that includes two or more 
isolated single-chained polypeptides consisting only of an MTSP protease domain in which a 
free Cys was replaced with another amino acid. In maintaining the rejection, the Examiner 
states that "assays using polypeptides linked to the molecules taught by O'Brien et al. utilize 
solid supports" (page 23 of the Office Action). In the assays described in O'Brien, a 
hybridization probe to the nucleotide encoding TAGD-1 5 polypeptide (such as in a standard 
Northem blot assay) or an antibody to the TAGD-1 5 polypeptide (such as in a standard 
immunoassay) is attached to a solid support. Appellant respectfully submits that, although 
such assays can use solid supports, O'Brien does not teach or suggest an isolated single- 
chained polypeptide consisting only of an MTSP protease domain in which a free Cys was 
replaced with another amino acid nor conjugating two or more such isolated protease 
domains to a solid support. Accordingly, for these reasons and the reasons discussed above 
with respect to claim 1 , Appellant respectfially submits that claim 40 is not taught or 
suggested by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 40. Appellant respectfiilly submits that the rejection of claim 40 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 41 

Claim 41 recites a solid support comprising two or more polypeptides of claim 1 linked 
thereto either directly or via a linker where the polypeptides comprise an array. The 
specification defines an array as a collection of elements containing three or more members. 
As discussed above, O'Brien does not mention a solid support. O'Brien provides no teaching 
or suggestion for isolating the protease domain and preparing it as a single chain. There is no 
teaching or suggestion in O'Brien of a solid support that includes three or more isolated single- 
chained polypeptides consisting only of an MTSP protease domain in which a free Cys was 
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replaced with another amino acid. Accordingly, for these reasons and the reasons discussed 
above with respect to claim 1, claim 41 is not taught or suggested by O'Brien. Thus, the 
Examiner has failed to set forth a prima facie case of obviousness of claim 41 . Appellant 
respectfully submits that the rejection of claim 41 as obvious over O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 
Dependent Claim 42 

Claim 42 is directed to the solid support of claim 41, wherein the array comprises 
polypeptides having different MTSP protease domains. There is no teaching or suggestion in 
O'Brien of a solid support that includes three or more isolated single-chained polypeptides 
consisting only of an MTSP protease domain in which a free Cys was replaced with another 
amino acid. Further, the only MTSP taught in O'Brien is TAGD-15. There is no teaching or 
suggestion of any other MTSP. Hence, there can be no teaching or suggestion in O'Brien to 
conjugate isolated protease domains from different MTSPs to a solid support to form an 
array. Accordingly, for these reasons and the reasons discussed above with respect to claim 
1, Appellant respectfiilly submits that claim 42 is not taught or suggested by O'Brien. Thus, 
the Examiner has failed to set forth a prima facie case of obviousness of claim 42. Appellant 
respectfully submits that the rejection of claim 42 as obvious over O'Brien is erroneous in 
law and fact and, therefore, should be reversed. 

Dependent Claim 113 

Claim 1 1 3 is directed to a solid support comprising two or more polypeptides of claim 
12 linked thereto either directly or via a linker. Claim 12 depends from claim 1 and recites 
that the MTSP protease domain consists of a sequence of amino acid residues selected from 
among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the 
amino acid residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 
Claim 12 includes every limitation of claim 1, from which it depends. 

O'Brien does not mention a solid support. Furthermore, there is no teaching or 
suggestion in O'Brien of a solid support that includes two or more isolated single-chain 
polypeptides consisting only of an MTSP protease domain in which a free Cys was replaced 
with another amino acid. Accordingly, for these reasons and the reasons discussed above 
with respect to claim 1 , Appellant respectfully submits that claim 113 is not taught or 
suggested by O'Brien the Examiner has failed to set forth a prima facie case of obviousness 
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of claim 113. Appellant respectfully submits that the rejection of claim 1 1 3 as obvious over 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 114 

Claim 114 depends from claim 113 £ind is directed to an array. The specification 
defines an array as a collection of elements containing three or more members. O'Brien 
provides no teaching or suggestion of an array that includes three or more isolated single- 
chained polypeptides consisting only of an MTSP protease domain in which a free Cys was 
replaced with another amino acid. Accordingly, for these reasons and the reasons discussed 
above with respect to claim 1, claim 1 14 is not taught or suggested by O'Brien. Thus, the 
Examiner has failed to set forth a prima facie case of obviousness of claim 114. Appellant 
respectfully submits that the rejection of claim 1 14 as obvious over O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 



Appell£mt respectftiUy submits that claim 1 as well as each of claims 11-13, 34-36, 
40-42, 113 and 114, which ultimately depend from claim 1 and include every limitation 
thereof, are nonobvious and distinguishable from the teachings of O'Brien. Thus, Appellant 
respectfully submits that the Examiner has failed to establish claims 1, 11-13, 34-36, 40-42, 
113 and 1 14 as obvious under 35 U.S.C. §103(a) over O'Brien. Accordingly, Appellant 
respectfiiUy submits that the rejection of claims 1, 11-13, 34-36, 40-42, 1 13 and 1 14 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

VIII. CONCLUSIONS 

Appellant respectfully submits that the rejection of claims 1,11, 20, 34-36, 40-42, 113 
and 114 under 35 U.S.C. §112, first paragraph, as allegedly containing subject matter that 
was not described in the specification in such a way as to reasonably convey to one skilled in 
the art that the inventor, at the time the application was filed, had possession of the claimed 
subject matter, is erroneous in law and fact and, therefore, should be reversed. 

Appellant also respectfully submits that the rejection of claims 1,11, 20, 34-36, 40- 
42, 113 and 1 14 under 35 U.S.C. § 1 12, first paragraph, because the specification allegedly 
fails to describe the claimed subject matter in such a way as to enable one skilled in the art to 
make and use the claimed subject matter commensurate in scope with these claims, is 
erroneous in law and fact and, therefore, should be reversed. 
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Appellant also respectfully submits that the Examiner has failed to establish claims 1, 
11-13, 20, 34-36, 40-42, 1 13 and 1 14 as anticipated by Takeuchi under 35 U.S.C. §102(b). 
Accordingly, Appellant respectfully submits that the rejection of claims 1-3, 19 and 20 as 
anticipated by Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

Appellant also respectfully submits that the Examiner has failed to establish claims 1, 
11-13 and 34 as anticipated by O'Brien under 35 U.S.C. §102(e). Accordingly, Appellant 
respectfully submits that the rejection of claims 1,11-13 and 34 as anticipated by O'Brien is 
erroneous in law and fact and, therefore, should be reversed. 

Appellant further respectfully submits that the Examiner has failed to establish claims 1 , 
11-13, 34-36, 40-42, 1 13 and 1 14 as obvious under 35 U.S.C. §103(a) over O'Brien. 
Accordingly, Appellant respectfully submits that the rejection of claims 1, 11-13, 34-36, 40-42, 
113 and 1 14 as obvious over O'Brien is erroneous in law and fact and, therefore, should be 
reversed. 



The Director is authorized to charge any fees that may be required, or to credit any 
overpayment to Deposit Account No. 02-1 81 8. Please indicate the Attorney Docket No. 
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CLAIMS APPENDIX 



PENDING CLAIMS ON APPEAL OF 
U.S. PATENT APPLICATION SERIAL NO. 09/776,191 

1. (Rejected) An isolated, substantially purified single-chain poly-peptide, 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, wherein: 

a fi"ee Cys in the protease domain is replaced with another amino acid; and 
the MTSP protease domain or catalytically active fi-agment thereof has serine protease 
activity as a single chain. 

2. - 9, (Cancelled). 

10. (Withdrawn) The substantially purified polypeptide of claim 1, wherein the 
MTSP portion has an N-terminus that comprises IVNG, ILGG, VGLL or ILGG. 

1 1 . (Rejected) The substantially purified polypeptide of claim 1 , wherein the MTSP 
is selected fi-om among MTSPl, MTSP3, MTSP4 and MTSP6. 

12. (Rejected) The substantially purified polypeptide of claim 1, wherein the MTSP 
protease domain consists of a sequence of amino acid residues selected from among amino 
acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the amino acid 
residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 

13. (Rejected) The substantially purified polypeptide of claim 1 that has at least about 
95% sequence identity with a protease domain consisting of a sequence of amino acid residues 
selected fi-om among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID 
NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217-443 in SEQ ID No. 12. 

Claims 14-19 (Cancelled). 

20. (Rejected) The polypeptide of claim 1, wherein a fi-ee Cys in the protease 
domain is replaced with a serine. 

Claims 21- 33 (Cancelled). 

34. (Rejected) The polypeptide of claim 1, wherein the MTSP is selected fi-om 
among corin, MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), 
TMPRSS2, and TMPRSS4. 

35. (Rejected) A conjugate, comprising: 

a) a polypeptide of claim 1 , and 

b) a targeting agent linked to the protein directly or via a linker, wherein the 
conjugate has serine protease activity. 
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36. (Rejected) The conjugate of claim 35, wherein the targeting agent permits 

i) affinity isolation or purification of the conjugate; 

ii) attachment of the conjugate to a surface; 

iii) detection of the conjugate; or 

iv) targeted delivery to a selected tissue or cell. 
Claims 37 -39 (Cancelled) 

40. (Rejected) A solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. 

41 . (Rejected) The support of claim 40, wherein the polypeptides comprise an 

array. 

42. (Rejected) The support of claim 41, wherein the array comprises polypeptides 
having different MTSP protease domains. 

43. (Withdrawn) A method for identifying candidate anti-tumor compounds that 
inhibit the protease activity of an MTSP, comprising: 

contacting a polypeptide of claim 1 with a substrate proteolytically cleaved by the 
MTSP, and, either simultaneously, before or after, adding a test compound or plurality thereof; 
measuring the amount of substrate cleaved in the presence of the test compound; and 
selecting compounds that change the amount cleaved compared to a control, whereby 
compounds that modulate the activity of the MTSP are identified. 

44. (Withdrawn) The method of claim 43, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, £intibodies or fragments thereof. 

45. (Withdrawn) The method of claim 43, wherein a plurality of the test 
compounds are screened simultaneously. 

46. (Withdrawn) The method of claim 43, wherein the change in the amount 
cleaved is assessed by comparing the amount cleaved in the presence of the test compound 
with the amount in the absence of the test compound. 

47. (Cancelled) 

48. (Withdrawn) The method of claim 43, wherein a plurality of the polypeptides 
are linked to a solid support, either directly or via a linker. 

49. (Withdrawn) The method of claim 43, wherein the polypeptides comprise an 
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50. (Withdrawn) The method of claim 43, wherein the polypeptides comprise a 
pluraHty of different MTSP proteases. 

5 1 . (Withdrawn) A method of identifying a compound that specifically binds to a 
single chain protease domain of an MTSP, comprising: 

contacting a polypeptide of claim 1 with a test compound or plurality thereof under 
conditions conducive to binding thereof; and 

identifying compounds that specifically bind to the MTSP single chain protease domain or 
compounds that inhibit binding of a compound known to bind to the MTSP single chain 
protease domain, wherein the known compound is contacted with the polypeptide before, 
simultaneously with or after the test compound. 

52. (Withdrawn) The method of claims 51, wherein the polypeptide is linked either 
directly or indirectly via a linker to a solid support. 

53. (Withdrawn) The method of claim 51, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, antibodies or fi-agments thereof 

54. (Withdrawn) The method of claim 51, wherein a plurality of the test substances 
are screened for simultaneously. 

55. (Withdrawn) The method of claim 52, wherein a plurality of the polypeptides 
are linked to a solid support. 

56. -107. (Cancelled). 

108. (Withdrawn) A conjugate, comprising: 

a) an MTSP3 or an MTSP4 or the MTSP6 of claim 12; and 

b) a targeting agent linked to the protein directly or via a linker. 

109. (Withdrawn) The conjugate of claim 108, wherein the targeting agent permits 

i) affinity isolation or purification of the conjugate; 

ii) attachment of the conjugate to a surface; 

iii) detection of the conjugate; or 

iv) targeted delivery to a selected tissue or cell. 
Claims 110-112 (Cancelled). 

113. (Rejected) A solid support comprising two or more polypeptides of claim 12 
linked thereto either directly or via a linker 

114. (Rejected) The support of claim 113, wherein the polypeptides comprise an 
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115. (Withdrawn) A method for identifying compounds that modulate the protease 
activity of an MTSP of claim 1 , comprising: 

contacting the MTSP of claim 1 with a substrate proteolytically cleaved by the MTSP, 
and, either simultaneously, before or after, adding a test compound or plurality thereof; 
measuring the amount of substrate cleaved in the presence of the test compound; and 
selecting compounds that change the amount cleaved compared to a control, whereby 
compounds that modulate the activity of the MTSP are identified. 

116. (Withdrawn) The method of claim 1 1 5, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, antibodies or fragments thereof. 

117. (Cancelled). 

118. (Withdrawn) The method of claim 115, wherein the change in the amount 
cleaved is assessed by comparing the amount cleaved in the presence of the test compound 
with the amount in the absence of the test compound. 

119. (Withdrawn) The method of claim 115, wherein a plurality of the test substances 
are screened for simultaneously. 

120. (Withdrawn) The method of claim 119, wherein a plurality of the polypeptides 
are linked to a solid support. 

121. (Cancelled). 

122. (Withdrawn) A method of identifying a compound that specifically binds to an 
MTSP protease domain, comprising: 

contacting an MTSP protease domain of claim 12 with a test compound or plurality thereof 
under conditions conducive to binding thereof; and 

identifying compounds that specifically bind to the MTSP. 

123. (Withdrawn) The method of claim 122, wherein the polypeptide is linked either 
directly or indirectly via a linker to a solid support. 

124. (Withdrawn) The method of claim 122, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, antibodies or fragments thereof. 

125. (Withdrawn) The method of claim 122, wherein a plurality of the test substances 
are screened for simultaneously, 

126. (Withdrawn) The method of claim 125, wherein a plurality of the polypeptides 



are linked to a solid support. 

127.- 137. (Cancelled). 
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DETAILED ACTION 

This application is a CIP of 09/657,986, now issued as U.S. Patent No. 
6,797,504. 

The amendment filed on December 26, 2007, amending claim 1 and canceling 
claims 2-3 and 19, has been entered. 

Claims 1, 10-13, 20, 34-36, 40-46, 48-55, 108-109, 113-116, 118-120 and 122- 
126 are pending. Claims 10,43-46,48-55, 108-109, 115-116, 118-120 and 122-126 
are withdrawn. Claims 1, 11-13, 20, 34-36, 40-42 and 113-114 are under consideration. 

Priority 

Applicant's claim for domestic priority under 35 U.S.C. 1 19(e) is acknowledged. 
However, the provisional applications upon which priority is claimed fails to provide 
adequate support under 35 U.S.C. 112 for claims 11-13 and 34 of this application. 

Provisional applications 60/179,982, 60/183,542, 60/213,124, 60/220,970 and 
60/234,840 fail to provide adequate support for polypeptides comprising the serine 
protease domain of MTSP1 . Provisional applications 60/1 79,982 and 60/1 83,542 
describe polypeptides related MTSP3 and provisional application 60/213,124, 
60/220,970 and 60/234,840 describe polypeptides related to MTSP4. 

Therefore, the effective filing date for purpose of prior art is the filing date of 
09/657,986, which is 9/8/2000. 
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The information disclosure statement (IDS) submitted on December 26, 2007 
was filed after the mailing date of the Non-Final Rejection on June 25, 2007. The 
submission is in compliance with the provisions of 37 CFR 1 .97. Accordingly, the 
information disclosure statement is being considered by the examiner. 

Response to Arguments 

Applicant's amendment and arguments filed on December 26, 2007, have been 
fully considered and are deemed to be persuasive to overcome some of the rejections 
previously applied. Rejections and/or objections not reiterated from previous office 
actions are hereby withdrawn. 

Claim Objections 

Applicants argue that claims 11-13 and 34 should be retained pending a 
determination of the allowability of claim 1 , which is a linking claim, linking the elected 
subject matter. In view of applicant's argument, the objection to claims 11-13 and 34 
have been withdrawn. 

Claim Rejections - 35 (JSC §112-2""^ paragraph 

In view of applicant's argument, the rejection of claims 1, 11-13 and claims 20, 
34-36, 40-42 and 113-114 depending therefrom under 35 U.S.C. 112. second 
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paragraph, as being indefinite for failing to particularly point out and distinctly claim the 
subject matter which applicant regards as the invention has been withdrawn. 



Claim Rejections - 35 USC §112- 1^^ paragraph 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

Claims 1, 11, 20, 34-36, 40-42 and 113-114 are rejected under 35 U.S.C. 112, 
first paragraph, as containing subject matter which was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventor(s), at the time the application was filed, had possession of the claimed 
invention. 

Claims 1 , 1 1 , 20, 34-36, 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of type-ll 
membrane-type serine protease (MTSP) from any source. Claims 1 1 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to a genus of polypeptides having any structure. The specification only 
teaches four species, amino acids 615-855 of SEQ ID NO:2 (MTSP1), amino acids of 
205-437 of SEQ ID NO:4 (MTSP3), amino acids of SEQ ID NO:6 (MTSP4) and amino 
acids 217-443 of SEQ ID NO:1 1 (MTSP6). These species are not enough to describe 
the whole genus and there is no evidence on the record of the relationship between the 
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structure of the above catalytically active protease domains of SEQ ID NOs: 2, 4, 6 and 
1 1 and the structure of the serine protease domain of any or all MTSP polypeptides or 
MTSP1 polypeptides. Further, the specification does not describe the structure of a 
catalytically active fragment of a protease domain of any or all MTSP polypeptide. 
Therefore, the specification fails to describe a representative species of the genus of 
polypeptides consisting of a serine protease domain or a catalytically active portion of a 
MTSP polypeptide. 

Given this lack of description of the representative species encompassed by the 
genus of the claims, the specification fails to sufficiently describe the claimed invention 
in such full, clear, concise, and exact terms that a skilled artisan would recognize that 
applicants were in possession of the inventions of claims 1 , 1 1 , 20, 34-36, 40-42 and 
113-114. 

Applicant is referred to the revised guidelines concerning compliance with the 
written description requirement of U.S.C. 112, first paragraph, published in the Official 
Gazette and also available at www.uspto.qov . 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are fully described because the specification 
identified 17 members of the MTSP family and identifies the protease domains thereof, 
unknown MTSPs and its protease domains. Examiner respectfully disagrees. The 
claims are not limited to specific protease domains of specific MTSP proteins, but the 
claims are drawn to polypeptides consisting of any protease domains or any or all 
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catalytically active fragments of said protease domains of any or all MTSP or any or all 
MTSP1, including any or all recombinants, variants and mutants of said MTSP or 
MTSP1 . The recitation of "protease domain of a MTSP" or "MTSP1" fails to provide a 
sufficient description of the claimed genus of polypeptides as it merely describes the 
functional features of the genus without providing any definition of the structural features 
of the species within the genus. The CAFC in UC California v. Eli Lilly, (43 USPQ2d 
1398) stated that: "in claims to genetic material, however a generic statement such as 
'vertebrate insulin cDNA* or 'mammalian insulin cDNA,' without more, is not an 
adequate written description of the genus because it does not distinguish the claimed 
genus from others, except by function. It does not specifically define any of the genes 
that fall within its definition. It does not define any structural features commonly 
possessed by members of the genus that distinguish them from others. One skilled in 
the art therefore cannot, as one can do with a fully described genus, visualize or 
recognize the identity of the members of the genus." Similarly with the claimed genus of 
protease domains, the functional definition of the genus does not provide any structural 
information commonly possessed by members of the genus which distinguish the 
species within the genus from other proteins such that one can visualize or recognize 
the identity of the members of the genus. 

Further, as discussed in the written description guidelines, the written description 
requirement for a claimed genus may be satisfied through sufficient description of a 
representative number of species by actual reduction to practice, reduction to drawings, 
or by disclosure of relevant, identifying characteristics, i.e., structure or other physical 



Application/Control Number: 09/776,191 Page 7 

Art Unit: 1652 

and/or chemical properties, by functional characteristics coupled with a known or 
disclosed correlation between function and structure, or by a combination of such 
identifying characteristics, sufficient to show the applicant was in possession of the 
claimed genus. A representative number of species means that the species which are 
adequately described are representative of the entire genus. Thus, when there is 
substantial variation within the genus, one must describe a sufficient variety of 
species to reflect the variation within the genus. Satisfactory disclosure of a 
representative number depends on whether one of skill in the art would recognize that 
the applicant was in possession of the necessary common attributes or features of the 
elements possessed by the members of the genus in view of the species disclosed. For 
inventions in an unpredictable art, adequate written description of a genus which 
embraces widely variant species cannot be achieved by disclosing only one species 
within the genus. In the instant case the claimed genera of the claims are drawn to 
species which are widely variant in structure. The genus of the claims are structurally 
diverse as it encompasses any catalytically active protease domains of any or all MTSP 
or MTSP1 , excepting having serine protease activity. As such, neither the description of 
solely structural features present in all members of the genus is sufficient to be 
representative of the attributes and features of the entire genus. 

Applicants also argue that the claims are fully described because members of 
the MTSP family of serine proteases were well known at the time of filing, such as 
conserved characteristic structural elements and protease domains and method of 
identifying serine protease domains were known in the art. Examiner respectfully 
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disagrees. As discussed above, the claims are not drawn to the specific protease 
domains of specific MTSP type II, but to polypeptides consisting of any protease 
domains or any or all catalytically active fragments of said protease domains of any or 
all MTSP or any or all MTSP1 , including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 . In view of the widely variant species encompassed by the 
genus, the species disclosed in the specification is not enough and does not constitute 
a representative number of species to describe the whole genus of any or all variants, 
recombinant and mutants of any or all polypeptides having serine protease activity 
isolated from any or all source, including any or all variants, recombinants and mutants 
thereof, and there is no evidence on the record of the relationship between the structure 
of the protease domain of the specific MTSPs disclosed in the specification and the 
structure of any or all recombinant, variant and mutant of any or all polypeptides having 
serine protease activity. Therefore, the specification fails to describe a representative 
species of the genus comprising any or all polypeptides having serine protease activity, 
including any or all variants, recombinants and mutants thereof. 

Applicants also argue that the claims are fully described by the specification 
because one skilled in the art would recognize applicant's possession of the claimed 
subject matter. Examiner respectfully disagrees. As discussed above, the claims are 
not drawn to the specific protease domains of specific MTSP type II, but to polypeptides 
consisting of any protease domains or any or all catalytically active fragments of said 
protease domains of any or all MTSP or any or all MTSP1, including any or all 
recombinants, variants and mutants of said MTSP or MTSP1 . The claimed genera of 
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the claims are drawn to species which are widely variant in structure. The genus of the 
claims are structurally diverse as it encompasses any catalytically active protease 
domains of any or all MTSP or MTSP1 , excepting having serine protease activity. As 
such, neither the description of solely stnjctural features present in all members of the 
genus is sufficient to be representative of the attributes and features of the entire genus. 
Hence the rejection is maintained. 

Claims 1,11, 20, 34-36, 40-42 and 113-114 are rejected under 35 U.S.C. 112, 
first paragraph, because the specification, while being enabling for a polypeptide 
consisting of amino acids 615-855 of SEQ ID NO:2, does not reasonably provide 
enablement for a polypeptide consisting of any protease domain of any type II 
membrane type serine protease (MTSP) or MTSP1 or a catalytically active portion 
thereof- The specification does not enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the invention 
commensurate in scope with these claims. 

Factors to be considered in determining whether undue experimentation is 
required are summarized in In re Wands 858 F.2d 731 . 8 USPQ2nd 1400 (Fed. Cir. 
1988) . They include (1) the quantity of experimentation necessary, (2) the amount of 
direction or guidance presented, (3) the presence or absence of working examples, (4) 
the nature of the invention, (5) the state of the prior art, (6) the relative skill of those in 
the art, (7) the predictability or unpredictability of the art, and (8) the breadth of the 
claims. 
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Claims 1, 11, 20, 35-36, 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of a type-ll 
membrane-type serine protease (MTSP) from any source. Claims 1 1 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to polypeptides having undefined structure. 

The scope of the claims is not commensurate with the enablement provided by 
the disclosure with regard to the extremely large number of polypeptides comprising a 
protease or catalytically active domain broadly encompassed by the claims. Since the 
amino acid sequence of a protein determines its structural and functional properties, 
predictability of which changes can be tolerated in a protein's amino acid sequence and 
obtain the desired activity requires a knowledge of and guidance with regard to which 
amino acids in the protein's sequence, if any, are tolerant of modification and which are 
conserved (i.e. expectedly intolerant to modification), and detailed knowledge of the 
ways in which the proteins* structure relates to its function. However, in this case the 
disclosure is limited to the polypeptide comprising amino acids 615-855 of SEQ ID 
NO:2, or the amino acids of SEQ ID NO:50. 

It would require undue experimentation of the skilled artisan to make and use the 
claimed polypeptides. The specification is limited to teaching the use of polypeptide 
consisting of amino acids 61 5-855 of SEQ ID NO:2 or the amino acids of SEQ ID NO:50 
but provides no guidance with regard to the making of variants and mutants or with 
regard to other uses. In view of the great breadth of the claim, amount of 
experimentation required to make the claimed polypeptides, the lack of guidance, 
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working examples, and unpredictability of the art in predicting function from a 
polypeptide primary structure, the claimed invention would require undue 
experimentation. As such, the specification fails to teach one of ordinary skill how to 
use the full scope of the polypeptides encompassed by the claims. 

While enzyme isolation techniques, recombinant and mutagenesis techniques 
are known, and it is routine in the art to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims, the specific amino acid positions 
within a protein's sequence where amino acid modifications can be made with a 
reasonable expectation of success in obtaining the desired activity/utility are limited in 
any protein and the result of such modifications is unpredictable. In addition, one skilled 
in the art would expect any tolerance to modification for a given protein to diminish with 
each further and additional modification, e.g. multiple substitutions. 

The specification does not support the broad scope of the claims which 
encompass all modifications and variants of a protease or catalytically active domain or 
modifications of amino acids 615-655 of SEQ ID NO:2 because the specification does 
not establish: (A) regions of the protein structure which may be modified without 
affecting MTSP/serine protease activity; (B) the general tolerance of MTSP to 
modification and extent of such tolerance; (C) a rational and predictable scheme for 
modifying any amino acid residue with an expectation of obtaining the desired biological 
function; and (D) the specification provides insufficient guidance as to which of the 
essentially infinite possible choices is likely to be successful. 
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Thus, applicants have not provided sufficient guidance to enable one of ordinary 
skill in the art to make and use the claimed invention in a manner reasonably correlated 
with the scope of the claims broadly including protease or catalytically active domains of 
MTSP with an enormous number of amino acid modifications of the MTSP polypeptides 
and of amino acids 615-855 of SEQ ID NO:2. The scope of the claims must bear a 
reasonable correlation with the scope of enablement {In re Fisher, 166 USPQ 19 24 
(CCPA 1970)). Without sufficient guidance, determination of the serine protease 
domain or the catalytically active domain of MTSP having the desired biological 
characteristics is unpredictable and the experimentation left to those skilled in the art is 
unnecessarily, and improperly, extensive and undue. See In re Wands 858 F.2d 731 , 8 
USPQ2nd 1400 (Fed. Cir, 1988). 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are enabled because the level of skill in the art 
is high and the specification teaches that MTSP polypeptides constitute a recognized 
well-known and well characterized family of serine protease and the specification 
describes the protease domain of a number of MTSP family members, such as 
conserved features of MTSP protease domains. Examiner respectfully disagrees. The 
scope of the claims, which are drawn to polypeptides consisting of any protease 
domains or any or all catalytically active fragments of said protease domains of any or 
all MTSP or any or all MTSP1, including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 , is not commensurate with the enablement provided by the 
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disclosure with regard to the extremely large number of polypeptides comprising a 
protease or catalytically active domain broadly encompassed by the claims. Even 
though the structure of some MTSP are known, the claims are drawn to any or all serine 
domains and catalytically active fragments of any or all protease domains of any or all 
MTSP or MTSP1 . As discussed above, predictability of which changes can be tolerated 
in a protein's amino acid sequence and obtain the desired activity requires a specific 
knowledge of and guidance with regard to which specific amino acids in the protein's 
sequence, can be modified such that the modified polypeptide continues to have said 
claimed activity. It is this specific guidance that applicants do not provide. While the art 
may teach in general the structure of MTSP conserved amino acid sequences, protease 
domains, X-ray crystal structure and etc, such teachings will not reduce the burden of 
undue experimentation on those of ordinary skill in the art. 

Applicants also argue that the claims are enabled because the knowledge, 
regarding MTSP proteins, of those skilled in the art is high. The Examiner respectfully 
disagrees. The claims are drawn to polypeptides consisting of any protease domains or 
any or all catalytically active fragments of said protease domains of any or all MTSP or 
any or all MTSP1 , including any or all recombinants, variants and mutants of said MTSP 
or MTSP1 . Since the amino acid sequence of the protein determines its structural and 
functional properties, predictability of which changes can be tolerated in a protein's 
amino acid sequence and obtain the desired activity requires a knowledge of and 
guidance with regard to which amino acids in the protein's sequence, if any, are tolerant 
of modification and which are conserved (i.e. expectedly intolerant to modification), and 



Application/Control Number: 09/776,191 Page 14 

Art Unit: 1652 

detailed knowledge of the ways in which the proteins* structure relates to its function. In 
addition, the art does not provide any teaching or guidance as to which amino acids 
within a serine protease can be modified and which ones are conserved such that one 
of skill in the art can make the recited polypeptides having serine protease activity and 
the general tolerance of serine proteases to structural modifications and the extent of 
such tolerance. The art clearly teaches that changes in a protein's amino acid 
sequence to obtain the desired activity without any guidance/knowledge as to which 
amino acids in a protein are required for that activity is highly unpredictable. At the time 
of the invention, there was a high level of unpredictability associated with altering a 
polypeptide sequence with an expectation that the polypeptide will maintain the desired 
activity. For example, Branden et al. (Introduction to Protein Structure, Garland 
Publishing Inc., New York, page 247, 1991 - cited previously on form PTO-892) teach 
that (1 ) protein engineers are frequently surprised by the range of effects caused by 
single mutations that they hoped would change only one specific and simple property in 
enzymes, (2) the often surprising results obtained by experiments where single 
mutations are made reveal how little is known about the rules of protein stability, and (3) 
the difficulties in designing de novo stable proteins with specific functions. 

Applicants argue that the specification discloses working examples, thus a 
person skilled in the art has sufficient guide in making the claimed polypeptides. 
Examiner respectfully disagrees. Even though the structure of some MTSP are taught, 
the claims are not only drawn to polypeptides consisting of catalytically active fragments 
of only MTSP1 , MTSP3, MTSP4 and MTSP6, but to any or all mutants, variants and 
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recombinants of any MTSP. Without specific guidance, those skilled in the art will be 
subjected to undue experimentation of making and testing each of the enormously large 
number of mutants that results from such experimentation. While the art may teach in 
general the structure of MTSP, consen/ed amino acid sequences, and etc, such 
teachings will not reduce the burden of undue experimentation on those of ordinary skill 
in the art. 

Hence the rejection is maintained. 



Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described In a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

(e) the invention was described in (1 ) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 



Claims 1-3 and 19-20 were rejected under 35 U.S.C. 102(b) as being anticipated 
by Dawson et al. 

In view of the fact that Dawson et al. do not teach an isolated serine protease 
domain of a MTSP protein, the rejection has been withdrawn. 
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Claims 1,11-13, 20, 34-36, 40-42 and 113-114 are rejected under 35 
U.S.C. 102(b) as being anticipated by Takeuchi et al. 

Claims 1 , 1 1 -1 3, 20 and 34 are drawn to a polypeptide consisting of a serine 
protease domain of MTSP having the characteristics recited in the claims. Claims 35- 
36 are drawn to a conjugate comprising a polypeptide comprising a serine protease 
domain of MTSP and a targeting agent. Claims 40 -42 and 113-114 are drawn to a 
solid support comprising a polypeptide comprising a serine protease domain of MTSP. 

Takeuchi et al. (Reference I J : PTO-1449) teaches a polypeptide comprising a 
fragment consisting of a serine protease domain that is 100% identical to amino acids 
615-855 of SEQ ID NO:2 of the instant invention (page 1 1060, 2""^ full paragraph). 
Takeuchi et al. discloses a purified activated protease domain, comprising amino acids 
615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 11057). 

Takeuchi et ai. teaches a catalytically active polypeptide comprising the serine 
protease domain linked to a His-tag (page 1 1055, 3''^ full paragraph, page 1 1057, 4*^ full 
paragraph). Takeuchi et al. also teaches a solid support comprising said polypeptide 
(page 11057, 4th full paragraph and Figure 5). Therefore, the teaching of Takeuchi et 
al. anticipates claims 1, 11-13, 20, 34-36, 40-42 and 113-114. 



Application/Control Number: 09/776.191 Page 17 

Art Unit: 1652 

Examiner notes that the contents of the reference were made public at the 
National Academy of Sciences colloquium held February 20-21, 1999 (see top of 
reference). 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that Takeuchi et al. does not anticipate the instant claims 
because the instant claims are drawn to a polypeptide that consists of a protease 
domain or catalytically active portion thereof. Examiner respectfully disagrees. In 
addition to the full-length MT-SP1 , Takeuchi et al. also discloses a polypeptide 
consisting of the serine protease domain. The serine protease domain is initially 
expressed in E. coli as a His-tagged fusion, but a renatured active protein lacking the 
His tag was isolated and N -terminal secuencing of this protein vielded WGGT , which 
corresponds to residues 615-619 of SEQ ID NO:2 of the instant invention. Takeuchi et 
al. discloses that Cys at position 731 forms a disulfide bond with Cys 604 present in the 
pro domain (page 1 1060). Since the serine protease domain of Takeuchi et al. lacks 
the pro domain of the wildtype protein, Cys residue at position 731 of said serine 
protease domain does not form a disulfide bond and therefore is a "free cysteine". The 
specification on page 58 states that in "the single chain form, the residue at 731 in the 
protease domain is free" (page 58, lines 15-16). Therefore, the serine protease domain 
of Takeuchi et al. is a single chain polypeptide. 

Applicants also argue that the claims are not anticipated by Takeuchi et al. 
because Takeuchi et al. does not disclose replacing a free Cys reside of the serine 
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protease domain of an MTSP polypeptide with another amino acid or a serine residue. 
Examiner respectfully disagrees. The limitation "a free Cys in the protease domain is 
replaced with another amino acid" and "a free Cys in the protease domain is replaced 
with a serine" is a product-by-process type limitation. The end result of the products of 
the claims is a serine protease domain or a serine protease domain having a serine 
residue. Whether the product of the claimed protein is obtained by replacing a free 
cysteine residue or not, the product is still the same because the instant claims may be 
produced by the recited modification or not. Therefore, there is no there a structure 
implied by said limitations. Since the polypeptide of Takeuchi et al. consists of a 
protease domain of a MTSP and the MTSP protease domain has serine protease 
activity, the claims are anticipated by the prior art. Also, since the serine protease 
domain of Takeuchi et al. has a serine residue, claim 20 is also anticipated. 
Hence the rejections are maintained. 

Claim Rejections - 35 USC § 102/103 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

The following is a quotation of 35 U.S.C. 1 03(a), which forms the basis for all 
obviousness rejections, set forth in this Office action: 
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(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior 
art are such that the subject matter as a whole would have been obvious at the time the invention was made to 
a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 

Claims 1, 11-13 and 34 rejected under 35 U.S.C. 103(a) as obvious over O'Brien 

et al. 

Claims 1, 11-13 and 34 are drawn to a polypeptide comprising a serine protease 
domain of MTSP. 

O'Brien et al. (U.S. Patent No. 5,972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention (SEQ ID NO:2, columns 19-24). O'Brien et al. teaches a serine protease 
domain having proteolytic activity that is 100% identical to amino acids 615-855 of SEQ 
ID NO:2 (Figure 2, Figure 10 and SEQ ID NO:14). Further, O'Brien et al. teaches a 
method of expressing polypeptides via a vector in host cells. O'Brien et al. also teaches 
that the protease domain could be released and be used as a diagnostic which has the 
potential for a target for therapeutic intervention (Column 15, lines 35-38). Therefore, it 
would have been obvious to one having ordinary skill in the art at the time the invention 
was made to express the protease domain of SQ ID NO:14 and purify the polypeptide. 
The motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 
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Therefore, the above reference renders claims 1 , 11-13 and 34 prima facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants also argue that one of skill in the art would recognize the disclosure of 
the polypeptide of O'Brien as not disclosing a single chain polypeptide. Examiner 
respectfully disagrees. Takeuchi et al. discloses that Cys at position 731 forms a 
disulfide bond with Cys 604 present in the pro domain (page 11060). Since the serine 
protease domain of Takeuchi et al. lacks the pro domain of the wildtype protein, Cys 
residue at position 731 of said serine protease domain does not form a disulfide bond 
and therefore is a "free cysteine". The specification on page 58 states that in "the single 
chain form, the residue at 731 in the protease domain is free" (page 58, lines 15-16). 
Therefore, the serine protease domain of O'Brien et al. is a single chain polypeptide. 

Applicants also argue that the claims are not anticipated by O'Brien et al. 
because O'Brien et al. does not disclose replacing a free Cys reside of the serine 
protease domain of an MTSP polypeptide with another amino acid. Examiner 
respectfully disagrees. The limitation "a free Cys in the protease domain is replaced 
with another amino acid" is a product-by-process type limitation. The end result of the 
products of the claims is a serine protease domain. Whether the product of the claimed 
protein is obtained by replacing a free cysteine residue or not, the product is still the 
same because the instant claims may be produced by the recited modification or not. 
Therefore, there is no there a structure implied by said limitations. Since the 
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polypeptide of O'Brien et al. consists of a protease donnain of a MTSP and the MTSP 
protease domain has serine protease activity, the claims are anticipated by the prior art. 

Applicants also argue that O'Brien et al. provides no teaching or suggestion of 
smaller fragments having serine protease activity because it does not teach how to 
make a single chain polypeptide that has serine protease activity. Examiner respectfully 
disagrees. O'Brien et al. teaches a method of expressing polypeptides via a vector in 
host cells. It is well within the skill available in the art to purify the protease domain 
since O'Brien et al. identifies the protease domain . Therefore, it would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
express the protease domain of SQ ID NO:14 and purify the polypeptide. The 
motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. Further, since the serine protease domain of Takeuchi et al. lacks the 
pro domain of the wildtype protein, Cys residue at position 731 of said serine protease 
domain does not form a disulfide bond and therefore is a "free cysteine". The 
specification on page 58 states that in "the single chain form, the residue at 731 in the 
protease domain is free" (page 58, lines 15-16). Also, as discussed previously, the 
limitation "a free Cys in the protease domain is replaced with another amino acid" is a 
product-by-process type limitation. The end result of the products of the claims is a 
serine protease domain. Whether the product of the claimed protein is obtained by 
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replacing a free cysteine residue or not, the product is still the same because the instant 
claims may be produced by the recited modification or not. Therefore, there is no there 
a structure implied by said limitations. Therefore, the serine protease domain of O'Brien 
et al. is a single chain polypeptide. 

Hence the rejections are maintained. 

Claims 35-36, 40-42 and 113-114 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over O'Brien et al. 

Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 
serine protease domain of MTSP and a targeting agent. Claims 40-42 and 113-114 are 
drawn to a solid support comprising a polypeptide comprising a serine protease domain 
of MTSP. 

O'Brien et al. (U.S. Patent No. 5,972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention, as discussed above. O'Brien et al. also teaches that the protease domain 
could be released the used as a diagnostic which has the potential for a target for 
therapeutic intervention (Column 15, lines 35-38). 

O'Brien et al. also teaches method of making fragments of SEQ ID NO:2 
(Column 9, lines 22-55). O'Brien et al. teaches said fragments linked to another 
polypeptide (Column 9, lines 54-55) and conjugated to bridging molecules (Column 6, 



Application/Control Number: 09/776,191 Page 23 

Art Unit: 1652 

lines 27-39) for detecting the polypeptide. Assays using polypeptides linked to the 
molecules taught by O'Brien et al. utilize solid supports. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to make a polypeptide comprising of the 
serine protease domain of SEQ ID NO:2 taught by O'Brien et al. and to make 
conjugates and solid support comprising of a polypeptide comprised of the serine 
protease domain of SEQ ID NO:2. The motivation of making such a polypeptides is to 
use it as a diagnostic which has the potential for a target for therapeutic intervention. 
The motivation of making conjugates and solid supports comprising of said polypeptide 
is to use the conjugate and solid support in a variety of diagnostic assays. One of 
ordinary skill in the art would have had a reasonable expectation of success making 
fragments of a polypeptide is routine in the art and O'Brien et al. teaches how to make 
fragments of SEQ ID NO:2. One of ordinary skill in the art would have had a 
reasonable expectation of success in diagnostic assays using conjugates and solid 
supports comprising a polypeptide is very well known, as taught by O'Brien et al. 

Therefore, the above references render claims 35-36 and 40-42 pnma facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejection and has been discussed above. 

Hence the rejection is maintained. 
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The rejection of claims 19-20 under 35 U.S.C. 103(a) as being unpatentable over 
O'Brien et al. and Estell et al. in viewof Takeuchi et al. has been withdrawn. 



Conclusion 

None of the claims are in condition for allowance. 



THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 



Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Yong Pak whose telephone number is 571-272-0935. 
The examiner can normally be reached 6:30 A.M. to 5:00 P.M. Monday through 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Nashaat Nashed can be reached on 571-272-0934. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 571-272- 
1600. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
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Business Center (EBC) at 866-217-9197 (toll free). 
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DETAILED ACTION 

The petition of March 23, 2007 is being treated as a request for reconsideration. 
In view of said request, the finality of the previous Office action is withdrawn, rendering 
the petition moot. A new action on the merits is set forth below. 

This application is a CIP of 09/657,986, now issued as U.S. Patent No. 
6,797.504. 

The amendment filed on October 23, 2006, amending claims 1. 12, 13 and 19 
and canceling claim 5, has been entered. 

Claims 1-3, 10-13, 19-20, 34-36, 40-46,48-65, 108-109 113-116. 118-120 and 
122-126 are pending. Claims 10,43-46,48-55, 108-109, 115-116, 118-120 and 122- 
126 are withdrawn. Claims 1-3, 11-13. 19-20. 34-36, 40-42 and 113-114 are under 
consideration. 

Priority 

Applicant's claim for domestic priority under 35 U.S.C. 119(e) is acknowledged. 
However, the provisional applications upon which priority is claimed fails to provide 
adequate support under 35 U.S.C. 112 for claims 11-13 and 34 of this application. 

Provisional applications 60/179,982. 60/183.542. 60/213,124. 60/220.970 and 
60/234.840 fail to provide adequate support for polypeptides comprising the serine 
protease domain of MTSP1 . Provisional applications 60/179,982 and 60/183.542 
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describe polypeptides related MTSP3 and provisional application 60/213.124, 
60/220.970 and 60/234,840 describe polypeptides related to MTSP4. 

Therefore, the effective filing date for purpose of prior art is the filing date of 
09/657,986. which is 9/8/2000. 

Response to Arguments 

Applicant's amendment and arguments filed on October 23. 2006. have been 
fully considered and are deemed to be persuasive to overcome the rejections previously 
applied. Rejections and/or objections not reiterated from previous office actions are 
hereby withdrawn. 

Claim Objections 

Claims 1 1-13 and 34 are objected for being drawn to non-elected subject matter. 
In response to the previous Office Action, applicants have traversed the above rejection. 
Applicants argue that claims 11-13 and 34 should be retained pending a determination 
of the allowability of claim .1, which is a linking claim, linking the elected subject matter. 
Since claim 1 has not been indicated as allowable, the objection is maintained. 

Claim Rejections - 35 USC §112 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 
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Claims 1-3, 11-12, 13 and claims 19-20, 34-36, 40-42 and 113-114 depending 
therefrom rejected under 35 U.S.C. 112, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claims 1-3, 11-12. 13 recite the phrase "substantially purified single-chain 
polypeptide". The metes and bounds of the phrase in the context of the above claims 
are not clear to the Examiner. It is not clear to the Examiner what is considered as 
"substantially purified" by the applicants. A perusal of the specification did not provide a 
clear definition for the above phrase. Without a clear definition, those skilled in the art 
would be unable to conclude if a polypeptide is a "substantially purified" polypeptide 
without knowing the metes and bounds of the phrase. Examiner requests clarification of 
the above phrase. 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that when read in light of the specification, the skilled artisan 
would understand the meaning of the recitation "substantially purified" and points to 
page 46, lines 4-15 of the specification for the definition of the phrase "substantially 
purified". Examiner respectfully disagrees. The specification on page 46, lines 4-15, 
does not define what applicants mean by "substantially purified", but only describes that 
"substantially pure means sufficiently homogeneous to appear free of readily detectable 
impurities as determined by standard methods of analysis". Since there is no clear 
guidance to one having ordinary skill in the art in qualifying the purity of an enzyme by 
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ascertaining whether it is free of readily detectable impurities, it is not clear to the 
Examiner as to how much of a presence of these readily detectable impurities qualifies 
an enzyme to be "substantially pure". Therefore, those skilled in the art would be 
unable to conclude what polypeptides are "substantially purified". 
Hence the rejection is maintained. 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which It Is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the Inventor of carrying out his invention. 

Claims 1-3. 11. 19-20, 34-36. 40-42 and 113-114 are rejected under 35 
U.S.C. 112. first paragraph, as containing subject matter which was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventor(s). at the time the application was filed, had possession of the claimed 
invention. 

Claims 1-3. 1 1, 19-20. 35-36, 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of type-ll 
membrane-type serine protease (MTSP) from any source. Claims 11 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to a genus of polypeptides having any structure. The specification only 
teaches four species, amino acids 615-655 of SEQ ID NO:2 (MTSP1), amino acids of 
205-437 of SEQ ID NO:4 (MTSP3). amino acids of SEQ ID NO:6 (MTSP4) and amino 
acids 217-443 of SEQ ID NO:1 1 (MTSP6). These species are not enough to describe 
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the whole genus and there is no evidence on the record of the relationship between the 
structure of the above catalytically active protease domains of SEQ ID NOs: 2. 4, 6 and 
1 1 and the structure of the serine protease domain of any or all MTSP polypeptides or 
MTSP1 polypeptides. Further, the specification does not describe the structure of a 
catalytically active fragment of a protease domain of any or all MTSP polypeptide. 
Therefore, the specification fails to describe a representative species of the genus of 
polypeptides comprising of a serine protease domain or a catalytically active portion of a 
MTSP polypeptide. 

Given this lack of description of the representative species encompassed by the 
genus of the claims, the specification fails to sufficiently describe the claimed invention 
in such full, clear, concise, and exact terms that a skilled artisan would recognize that 
applicants were in possession of the inventions of claims 1-3, 11, 19-20, 34-36, 40-42 
and 113-114. 

Applicant is referred to the revised guidelines concerning compliance with the 
written description requirement of U.S.C. 1 12, first paragraph, published in the Official 
Gazette and also available at www.uspto.gov . 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are fully described by the specification because 
the structural feature, a single chain protease domain, is present in all members of the 
genus and is the defining and requisite property and the specification clearly describes 
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this feature. Examiner respectfully disagrees. The recitation of "protease domain of a 
MTSP" or "MTSP1" fails to provide a sufficient description of the claimed genus of 
polynucleotides as it merely describes the functional features of the genus without 
providing any definition of the structural features of the species within the genus. The 
CAFC in UC California v. Eli Lilly. (43 USPQ2d 1398) stated that: "in claims to genetic 
material, however a generic statement such as 'vertebrate insulin cDNA' or 'mammalian 
insulin cDNA/ without more, is not an adequate written description of the genus 
because it does not distinguish the claimed genus from others, except by function. It 
does not specifically define any of the genes that fall within its definition. It does not 
define any structural features commonly possessed by members of the genus that 
distinguish them from others. One skilled in the art therefore cannot, as one can do with 
a fully described genus, visualize or recognize the identity of the members of the 
genus." Similarly with the claimed genus of protease domains, the functional definition 
of the genus does not provide any structural information commonly possessed by 
members of the genus which distinguish the species within the genus from other 
proteins such that one can visualize or recognize the identity of the members of the 
genus. 

Applicants also argue that the claims are fully described because the 
specification describes known MTSPs and identifies the protease domains thereof, 
unknown MTSPs and its protease domains. Examiner respectfully disagrees. The 
claims are not limited to specific protease domains of specific MTSP proteins, but the 
claims are drawn to polypeptides comprising any protease domains or any or all 
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catalytically active fragments of said protease domains of any or all MTSP or any or all 
MTSP1, including any or all recombinants, variants and mutants of said MTSP or 
MTSP1. As discussed in the written description guidelines, the written description 
requirement for a claimed genus may be satisfied through sufficient description of a 
representative number of species by actual reduction to practice, reduction to drawings, 
or by disclosure of relevant, identifying characteristics, i.e., structure or other physical 
and/or chemical properties, by functional characteristics coupled with a known or 
disclosed correlation between function and structure, or by a combination of such 
identifying characteristics, sufficient to show the applicant was in possession of the 
claimed genus. A representative number of species means that the species which are 
adequately described are representative of the entire genus. Thus, when there is 
substantial variation within the genus, one must describe a sufficient variety of 
species to reflect the variatibn within the genus. Satisfactory disclosure of a 
representative number depends on whether one of skill in the art would recognize that 
the applicant was in possession of the necessary common attributes or features of the 
elements possessed by the members of the genus in view of the species disclosed. For 
inventions in an unpredictable art, adequate written description of a genus which 
embraces widely variant species cannot be achieved by disclosing only one species 
within the genus. In the instant case the claimed genera of the claims are drawn to 
species which are widely variant in structure. The genus of the claims are structurally 
diverse as it encompasses any catalytically active protease domains of any or all MTSP 
or MTSP1, excepting having serine protease activity. As such, neither the description of 
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solely structural features present in all members of the genus is sufficient to be 
representative of the attributes and features of the entire genus. 

Applicants also argue that the specification provides "relevant, identifying 
characteristics" of a representative number of species of the claimed genus. Examiner 
respectfully disagrees. The claims are drawn to polypeptides comprising any protease 
domains or any or all catalytically active fragments of said protease domains of any or 
all MTSP or any or all MTSPI, including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 . The claims are drawn to polypeptides having any structure 
and therefore, the claims are drawn to a genus encompassing species having 
substantial variation and fails to describe a representative number of species. As 
discussed in the written description guidelines, the written description requirement for a 
claimed genus may be satisfied through sufficient description of a representative 
number of species by actual reduction to practice, reduction to drawings, or by 
disclosure of relevant, identifying characteristics, i.e.. structure or other physical and/or 
chemical properties, by functional characteristics coupled with a known or disclosed 
correlation between function and structure, or by a combination of such identifying 
characteristics, sufficient to show the applicant was in possession of the claimed genus. 
A representative number of species means that the species which are adequately 
described are representative of the entire genus. Thus, when there is substantial 
variation within the genus, one must describe a sufficient variety of species to 
reflect the variation within the genus. Satisfactory disclosure of a representative 
number depends on whether one of skill in the art would recognize that the applicant 
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was in possession of the necessary common attributes or features of the elements 
possessed by the members of the genus in view of the species disclosed. For 
inventions in an unpredictable art. adequate written description of a genus which 
embraces widely variant species cannot be achieved by disclosing only one species 
within the genus. In the instant case the claimed genera of the claims are drawn to 
species which are widely variant in structure. The genus of the claims are structurally 
diverse as it encompasses any catalytically active protease domains of any or all MTSP 
or MTSP1, excepting having serine protease activity. As such, neither the description of 
solely structural features present in all members of the genus is sufficient to be 
representative of the attributes and features of the entire genus. 

Applicants also argue that the claims are fully described because specification 
provides at least a dozen examples of protease domains of MTSPs. Examiner 
respectfully disagrees. The claims are not drawn to the specific protease domains of 
the MTSPs disclosed in the specification, but to polypeptides consisting of any protease 
domains or any or all catalytically active fragments of said prptease domains of any or 
all MTSP or any or all MTSP1. including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 . In view of the widely variant species encompassed by the 
genus, the species disclosed in the specification is not enough and does not constitute 

m 

a representative number of species to describe the whole genus of any or all variants, 
recombinant and mutants of any or all polypeptides having serine protease activity 
isolated from any or all source, including any or all variants, recombinants and mutants 
thereof, and there is no evidence on the record of the relationship between the structure 
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of the protease domain of the specific MTSPs disclosed in the specification and the 
structure of any or all recombinant, variant and mutant of any or all polypeptides having 
serine protease activity. Therefore, the specification fails to describe a representative 
species of the genus comprising any or all polypeptides having serine protease activity. 
Including any or all variants; recombinants and mutants thereof. 
Hence the rejection is maintained. 

Claims 1-3. 11, 19-20. 34-36. 40-42 and 113-114 are rejected under 35 
U.S.C. 112. first paragraph, because the specification, while being enabling for a 

« 

polypeptide consisting of amino acids 615-855 of SEQ ID NO:2, does not reasonably 
provide enablement for a polypeptide comprising any protease domain of any type II 
membrane type serine protease (MTSP) or MTSP1 or a catalytically active portion 
thereof. The specification does not enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the invention 
commensurate in scope with these claims. 

Factors to be considered in determining whether undue experimentation is 
required are summarized in In re Wands 858 F.2d 731. 8 USPQ2nd 1400 (Fed. Cir. 
1988) . They include (1) the quantity of experimentation necessary. (2) the amount of 
direction or guidance presented, (3) the presence or absence of working examples, (4) 
the nature of the invention. (5) the state of the prior art, (6) the relative skill of those in 
the art, (7) the predictability or unpredictability of the art, and (8) the breadth of the 
claims. 
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Claims 1-3. 11. 19^20, 35-36. 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of a type-ll 
membrane-type serine protease (MTSP) from any source. Claims 11 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to polypeptides having undefined structure. 

The scope of the claims is not commensurate with the enablement provided by 
the disclosure with regard to the extremely large number of polypeptides comprising a 
protease or catalytically active domain broadly encompassed by the claims. Since the 
amino acid sequence of a protein determines its structural and functional properties, 
predictability of which changes can be tolerated in a protein's amino acid sequence and 
obtain the desired activity requires a knowledge of and guidance with regard to which 
amino acids in the protein's sequence, if any, are tolerant of modification and which are 
conserved (i.e. expectedly intolerant to modification), and detailed knowledge of the 
ways In which the proteins' structure relates to its function. However, in this case the 
disclosure is limited to the polypeptide comprising amino acids 615-855 of SEQ ID 
NO:2, or the amino acids of SEQ ID NO:50. 

It would require undue experimentation of the skilled artisan to make and use the 
claimed polypeptides. The specification is limited to teaching the use of polypeptide 
comprising amino acids 61 5-855 of SEQ ID NO:2 or the amino acids of SEQ ID NO:50 
but provides no guidance with regard to the making of variants and mutants or with 
regard to other uses. In view of the great breadth of the claim, amount of 
experimentation required to make the claimed polypeptides, the lack of guidance. 
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working examples, and unpredictability of the art in predicting function fronn a 
polypeptide primary structure, the claimed invention would require undue 
experimentation. As such, the specification fails to teach one of ordinary skill how to 
use the full scope of the polypeptides encompassed by the claims. 

While enzyme isolation techniques, recombinant and mutagenesis techniques 
are known, and it is routine in the art to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims, the specific amino acid positions 
within a protein's sequence where amino acid modiflcations can be made with a 

■ 

reasonable expectation of success in obtaining the desired activity/utility are limited in 
any protein and the result of such modifications is unpredictable. In addition, one skilled 
in the art would expect any tolerance to modification for a given protein to diminish with 
each further and additional modification, e.g. multiple substitutions. 

The specification does not support the broad scope of the claims which 
encompass all modifications and variants of a protease or catalytically active domain or 
modifications of amino acids 615-855 of SEQ ID NO:2 because the specification does 
not establish: (A) regions of the protein structure which may be modified without 
affecting MTSP/serine protease activity; (B) the general tolerance of MTSP to 
modification and extent of such tolerance; (C) a rational and predictable scheme for 
modifying any amino acid residue with an expectation of obtaining the desired biological 
function; and (D) the specification provides insufficient guidance as to which of the 
essentially infinite possible choices is likely to be successful. 
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Thus, applicants have not provided sufficient guidance to enable one of ordinary 
skill in the art to make and use the claimed invention in a manner reasonably correlated 
with the scope of the claims broadly including protease or catalytically active domains of 
MTSP with an enormous number of amino acid modifications of the MTSP polypeptides 
and of amino acids 61 5-855 of SEQ ID N0:2. The scope of the claims must bear a 
reasonable correlation with the scope of enablement {In re Fisher, 166 USPQ 19 24 
(CCPA 1970)). Without sufficient guidance, determination of the serine protease 
domain or the catalytically active domain of MTSP having the desired biological 
characteristics is unpredictable and the experimentation left to those skilled in the art is 
unnecessarily, and improperly, extensive and undue. See In re Wands 858 F.2d 731. 8 
USPQ2nd 1400 (Fed. Cir, 1988). 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are enabled because the level of skill in the art 
is high and the specification teaches that MTSP polypeptides constitute a recognized 
well-known and well characterized family of serine protease and the specification 
describes the protease domain of a number of MTSP family members, such as 
conserved features of MTSP protease domains. Examiner respectfully disagrees. The 
scope of the claims, which are drawn to polypeptides comprising any protease domains 
or any or all catalytically active fragments of said protease domains of any or all MTSP 
or any or all MTSP1. including any or all recombinants, variants and mutants of said 
MTSP or MTSP1 , is not commensurate with the enablement provided by the disclosure 
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with regard to the extremely large number of polypeptides comprising a protease or 
catalytically active domain broadly encompassed by the claims, Even though the 
structure of some MTSP are known, the claims are drawn to any or all serine domains 
and catalytically active fragments of any or all protease domains of any or all MTSP or 
MTSP1. As discussed above, predictability of which changes can be tolerated in a 
protein's amino acid sequence and obtain the desired activity requires a specific 
knowledge of and guidance with regard to which specific amino acids in the protein's 
sequence, can be modified such that the modified polypeptide continues to have said 
claimed activity. It is this specific guidance that applicants do not provide. While the art 
may teach in general the structure of MTSP conserved amino acid sequences, protease 
domains. X-ray crystal structure and etc. such teachings will not reduce the burden of 
undue experimentation on those of ordinary skill in the art. 

Applicants also argue that the claims are enabled because the knowledge, 
regarding MTSP proteins, of those skilled in the art is high. The Examiner respectfully 
disagrees. The claims are drawn to polypeptides comprising any protease domains or 
any or all catalytically active fragments of said protease domains of any or all MTSP or 
any or all MTSP1 . including any or all recombinants, variants and mutants of said MTSP 
or MTSP1. Since the amino acid sequence of the protein determines its structural and 
functional properties, predictability of which changes can be tolerated in a protein's 
amino acid sequence and obtain the desired activity requires a knowledge of and 
guidance with regard to which amino acids in the protein's sequence, if any. are tolerant 
of modification and which are conserved (i.e. expectedly intolerant to modification), and 
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detailed knowledge of the ways in which the proteins* structure relates to its function. In 
addition, the art does not provide any teaching or guidance as to which amino acids 
within a serine protease can be modified and which ones are conserved such that one 
of skill in the art can make the recited polypeptides having serine protease activity and 
the general tolerance of serine proteases to structural modifications and the extent of 
such tolerance. The art clearly teaches that changes in a protein's amino acid 
sequence to obtain the desired activity without any guidance/knowledge as to which 
amino acids in a protein are required for that activity is highly unpredictable. At the time 
of the invention, there was a high level of unpredictability associated with altering a 
polypeptide sequence with an expectation that the polypeptide will maintain the desired 
activity. For example, Branden et al. (Introduction to Protein Structure, Garland 
Publishing Inc., New York, page 247, 1991) teach that (1) protein engineers are 
frequently surprised by the range of effects caused by single mutations that they hoped 
would change only one specific and simple property in enzymes, (2) the often surprising 
results obtained by experiments where single mutations are made reveal how little is 
known about the rules of protein stability, and (3) the difficulties in designing de novo 
stable proteins with specific functions. 

Applicants argue that the specification discloses working examples, thus a 
person skilled in the art has sufficient guide in making the claimed polypeptides. 
Examiner respectfully disagrees. Even though the structure of some MTSP are taught, 
the claims are not only drawn to polypeptides comprising catalytically active fragments 
of only MTSP1 , MTSP3, MTSP4 and MTSP6, but to any or all mutants, variants and 
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recombinants of any MTSP. Without specific guidance, those skilled in the art will be 
subjected to undue experimentation of making and testing each of the enormously large 
number of mutants that results from such experimentation. While the art may teach in 
general the structure of MTSP. conserved amino acid sequences, and etc, such 
teachings will not reduce the burden of undue experimentation on those of ordinary skill 
in the art. 

Hence the rejection is maintained. 



Claim Rejections - 35 USC § 102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent In the United 
states. 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 



Claims 1-3 and 19-20 are rejected under 35 U.S.C. 102(b) as being anticipated 
by Dawson et al. 

Claims 1-3 arid 19-20 are drawn to a polypeptide consisting of a serine protease 
domain of MTSP or catalytically active fragments thereof. 
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Dawson et al. (US Patent 5.465,833 -form PTO-892) discloses a polypeptide 
consisting of serine protease domain or a catalytically active fragment thereof of a 
MTSP protein, hepsin (Figure 1). Therefore, the reference of Dawson et al. anticipates 
claims 1-3 and 19-20. 

Claims 1-3, 11-13, 19-20, 34-36, 40-42 and 113-114 are rejected under 35 
U.S.C. 102(b) as being anticipated by Takeuchi et al. 

Claims 1-3. 11-13, 19-20 and 34 are drawn to a polypeptide comprising fragment 
consisting of a serine protease domain of MTSP having the characteristics recited in the 
claims. Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 
serine protease domain of MTSP and a targeting agent. Claims 40 -42 and 113-1 14 
are drawn to a solid support comprising a polypeptide comprising a serine protease 
domain of MTSP. 

Takeuchi et al. (Reference IJ : PTO-1449) teaches a polypeptide comprising a 
fragment consisting of a serine protease domain that is 100% identical to amino acids 
615-855 of SEQ ID NO:2 of the Instant invention (page 1 1060, 2""^ full paragraph), 
Takeuchi et al. discloses a purified activated protease domain, comprising amino acids 
615-855 of SEQ ID NO:2. confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057). The MTSP of Takeuchi et al. is not expressed on normal 
endothelia cells (page 1 1054. last paragraph and page 1 1055. 2"^ full paragraph), is of 
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human origin (Figure 1), consists essentially of the protease domain having catalytic 
activity (page 1 1060. 2"^ full paragraph), and is expressed in tumor cells (page 11055, 
top paragraph). 

Takeuchi et al. teaches a catalytically active polypeptide comprising the serine 
protease domain linked to a His-tag (page 11055. 3'"^ full paragraph, page 11057. 4*^ full 
paragraph). Takeuchi et al. also teaches a solid support comprising said polypeptide 
(page 11057, 4th full paragraph and Figure 5). Therefore, the teaching of Takeuchi et 
al. anticipates claims 1-3. 11-13. 19-20, 34-36. 40-42 and 113-114. 

Examiner notes that the contents of the reference were made public at the 
National Academy of Sciences colloquium held February 20-21. 1999 (see top of 
reference). 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that Takeuchi et al. does not anticipate the instant claims 
because the instant claims are drawn to a polypeptide that consists of a protease 
domain or catalytically active portion thereof. Examiner respectfully disagrees. In 
addition to the full-length MT-SP1, Takeuchi et al. also discloses a purified activated 
protease domain, consisting of amino acids 615-855 of SEQ ID NO:2, confirmed by an 
N-terminal sequence of the purified, activated protease domain yielding the expected 
WGGT sequence (Figure 3 and right column on page 1 1057). Therefore, said 
purified, activated protease domain anticipates the instant claims. 
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Applicants also argue that Takeuchi et al. does not anticipate the instant claims 
because the claimed polypeptide is a single chain polypeptide. Examiner respectfully 
disagrees. As discussed above. Takeuchi et al. discloses a purified activated protease 
domain, consisting of amino acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal 
sequence of the purified, activated protease domain yielding the expected WGGT 
sequence (Figure 3 and right column on page 11057). 
Hence the rejections are maintained. 



Claim Rejections - 35 USC § 102/103 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published undersection 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty In the English language. 

The following is a quotation of 35 U.S.C. 103(a), which forms the basis for all 
obviousness rejections, set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior 
art are such that the subject matter as a whole would have been obvious at the time the invention was made to 
a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 



Claims 1-3. 1 1-13 and 34 rejected under 35 U.S.C. 103(a) as obvious over 
O'Brien et al. 
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Claims 1-3, 11-13 and 34 are drawn to a polypeptide comprising a serine 
protease domain of MTSP. 

O'Brien et al. (U.S. Patent No. 5.972.616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID N0:2 of the instant 
invention (SEQ ID N0:2. columns 19-24). O'Brien et al. teaches a serine protease 
domain having proteolytic activity that is 100% identical to amino acids 615-855 of SEQ 
ID NO:2 (Figure 2. Figure 10 and SEQ ID NO:14). Further, O'Brien et al. teaches a 
method of expressing polypeptides via a vector in host cells. O'Brien et al. also teaches 
that the protease domain could be released the used as a diagnostic which has the 
potential for a target for therapeutic intervention (Column 15, lines 35-38). Therefore, it 
would have been obvious to one having ordinary skill in the art at the time the invention 
was made to express the protease domain of SQ ID NO:14 and purify the polypeptide. 
The motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 

Therefore, the above reference renders claims 1-3, 11-13 and 34 prima facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. 
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Applicants also argue that one of skill in the art would recognize the disclosure of 
the polypeptide of O'Brien as not disclosing a single chain polypeptide! Examiner 
respectfully disagrees. A single chain polypeptide is one sequence of amino acids 
beginning with a carboxyl end and terminating with an amino end. wherein the amino 
acids are connected via peptide bonds. Therefore, the protease domain obtained from 
O'Brien et al. can be construed as a single chain polypeptide. 

Applicants also argue that O'Brien et al. provides no teaching or suggestion of 
smaller fragments having serine protease activity because it does not teach how to 
make a single chain polypeptide that has serine protease activity. Examiner respectfully 
disagrees. O'Brien et al. teaches a method of expressing polypeptides via a vector in 
host cells. It is well within the skill available in the art to purify the protease domain 
since O'Brien et al. identifies the protease domain. Therefore, It would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
express the protease domain of SO ID NO: 14 and purify the polypeptide. The 
motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 

Applicants again argue that at the time of filing the instant application, one of skill 
in the art would not have had a reasonable expectation of success to express the 
protease domain because art evidences that a single-chained polypeptide would not 
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have been expected to have protease activity. Examiner respectfully disagrees. The 
claims are drawn to a polypeptide comprising a fragment consisting of a protease 
domain of SEQ ID NO:2. Therefore, said polypeptide being a single-chained 

polypeptide is an inherence property of said polypeptide since tNO polypeptides having 

« 

identical structure will have identical function and physical and chemical properties. 
Hence the rejections are maintained. 

Claims 35-36. 40-42 and 113-114 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over O'Brien et al. 

Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 
serine protease domain of MTSP and a targeting agent. Claims 40-42 and 113-114 are 
drawn to a solid support comprising a polypeptide comprising a serine protease domain 
of MTSP. 

O'Brien et al. (U.S. Patent No. 5.972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention, as discussed above. O'Brien et al. also teaches that the protease domain 
could be released the used as a diagnostic which has the potential for a target for 
therapeutic intervention (Column 15. lines 35-38). 

O'Brien et al. also teaches method of making fragments of SEQ ID NO:2 
(Column 9. lines 22-55). O'Brien et al. teaches said fragments linked to another 
polypeptide (Column 9, lines 54-55) and conjugated to bridging molecules (Column 6, 
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lines 27-39) for detecting the polypeptide. Assays using polypeptides linked to the 
molecules taught by O'Brien et al. utilize solid supports. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to make a polypeptide comprising of the 
serine protease domain of SEQ ID NO:2 taught by O'Brien et al. and to make 
conjugates and solid support comprising of a polypeptide comprised of the serine 
protease domain of SEQ ID NO:2. The motivation of making such a polypeptides is to 
use it as a diagnostic which has the potential for a target for therapeutic intervention. 
The motivation of making conjugates and solid supports comprising of said polypeptide 
is to use the conjugate and solid support in a variety of diagnostic assays. One of 
ordinary skill in the art would have had a reasonable expectation of success making 
fragments of a polypeptide is routine in the art and O'Brien et al. teaches how to make 
fragments of SEQ ID NO:2. One of ordinary skill in the art would have had a 
reasonable expectation of success in diagnostic assays using conjugates and solid 
supports comprising a polypeptide is very well known, as taught by O'Brien et al. 

Therefore, the above references render claims 35-36 and 40-42 prima facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the teachings of O'Brien et al. does not result in the 
instantly claimed compositions because O'Brien et al. does not teach or suggest a 
single chain polypeptide that includes a MTSP protease domain where the polypeptide 
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does not include any additional MTSP portions and the polypeptide has serine protease 
activity. O'Brien et al. does teach or suggest a single chain polypeptide comprising a 
MTSP portion, wherein the MTSP portion is a protease domain and wherein the MTSP 
portion has serine protease activity and wherein the MTSP portion is the only portion of 
the polypeptide because O'Brien et al. identifies the serine protease domain and one 
having ordinary skill in the art at the time the invention was filed would have been 
motivated to purify the serine protease domain of O'Brien et al. as discussed above. 

Hence the rejection is maintained. 

Claims 19-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
O'Brien et al. and Estell et al. in view of Takeuchi et al. 

Claims 19-20 are drawn to a polypeptide comprising the serine protease domain 
of a MTSP wherein free Cys residues are substituted with Ser residues. 

O'Brien et al. teaches a serine protease domain of a MTSP polypeptide, as 
discussed above. 

The reference of O'Brien et al. does not teach a serine protease domain of a 
MTPSP polypeptides wherein free Cys residues have been replaced with Ser residues. 

It is well known in the art that proteins form disulfide bonds via the SH groups of 
Cys residues. Upon making a polypeptide comprising a serine protease domain, a Cys 
residue which normally forms disulfide bonds in the full length polypeptide may be left 
free. For example. Takeuchi et al. (Reference IJ : PTO-1449) teaches that Cysteine at 
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position 731 of SEQ ID NO:2 normally forms a disulfide bond with a Cys residue in the 
pro-protease domain (see page 1 1060, top left paragraph and Figures 1 and 2). 

Cys residues are sensitive to oxidation due to their SH side group. Estell et al. 
(U.S. Patent No. 5,346.823) teaches that Cys residues replaced with Ser residues to 
decrease a polypeptide's susceptibility to oxidation (Abstract and Column 10. lines 34- 
38). Ser residues have sihnilar side chains as Cys residues and substitution of a Cys 
residue with a Ser residue is a conservative substitution. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to replace free Cys residues in the protease 
domain taught by O'Brien et al. with a Ser residue. One of ordinary skill in the art would 
be motivated to make such a change in order to enhance stability of the polypeptide. 
One of ordinary skill in the art would have had a reasonable expectation of success 
since Estell et al. teaches successful decrease of a protein's susceptibility to oxidation 
by substituting residues sensitive to oxidation with conservative substitutions. 

Therefore, the above references render claims 1 and 16. 18-20, 34 and 137 
prima facie obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the combination of the teachings of O'Brien et al. with 
the teachings of Estell et al.. and Takeuchi et al. does not result in the instantly claimed 
methods because O'Brien et al. does not teach or suggest a single chain polypeptide 
that includes a MTSP protease domain where the polypeptide does not include any 
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additional MTSP portions and the polypeptide has serine protease activity and that 
neither Takeuchi et al. nor Estell et al. remedy the defects of O'Brien et al. First, the 
claims are product claims and not method claims. Second, O'Brien et al. does teach or 
suggest a single chain polypeptide comprising a MTSP portion, wherein the MTSP 
portion is a protease domain and wherein the MTSP portion has serine protease activity 
and wherein the MTSP portion is the only portion of the polypeptide because O'Brien et 
al. identifies the serine protease domain and one having ordinary skill in the art at the 
time the invention was filed would have been motivated to purify the serine protease 
domain of O'Brien et al. as discussed above. 

Applicants argue that Takeuchi et al. teaches that every cysteine residue of the 
protein is disulfide bonded and therefore Takeuchi eta I. does not teach or suggest an 
MTSP protease domain having a free Cys residue. Examiner respectfully disagrees. 
Figure 4 applicants are referring to illustrate disulfide bonds of cysteine residues of the 
full length MTSP, for example, the Cys at position 830 is disulfide bonded to Cys at 
position 191. 

Hence the rejection is maintained. 

None of the claims are in condition for allowance. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Yong Pak whose telephone number is 571-272-0935. 
The examiner can normally be reached 6:30 A.M. to 5:00 P.M. Monday through 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor. Ponnathapu Achutamurthy can be reached on 571-272-0928. The fax 
phone number for the organization where this application or proceeding is assigned is 
571-273-8300. 
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Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 571-272- 
1600. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll free). 



Yong D. Pak 

Patent Examiner 1652 
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THE COVER 

Front The background photograph of the cover Is of a Laue x-ray diffraction 
pattern produced by a crystal of the plant enzyme ribulose bUphosphatc 
.carboxylase. This technique Is described In Chapter 17. Information derived 
from such x-ray patterns/ together with a knowledge of the amino acid 
sequence, enabled the three-dlnicnslonal arrangement of atoms in the protein 
to be determined. A simplified representation of this protein structure Is shown 
In color, superimposed on the diffraction pattern. The enzyme, which Is 
Involved In the fixation of cartx>n dioxide, is a member of the large class of 
a/p barrel protein structures. Thb class of structures is discussed in detail In 
Chapter 4. 

Back: Tomato bushy stunt virus Is a spherical virus made from 180 protein 
subunits. Arms extending from sixty of these subunlu contribute to an Internal 
framework that determines the size of the correcUy assembled vlruj particle. The 
Interdlgitated arms from three subunlu meet at each of the twenty Icosahedral 
threefold axes of the virus. One such axis Is shown here with the P strands from 
three subunits shown in different shades of green. Virus structure Is described 
in more detail in Chapter 11. 
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< and Design of Protein 
Structures 
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Over a period of moie than 3 bUllon years a laige variety of protein molecules 
^ has evolved to run the complex machinery of present-day cejls and organisms. 
i Most of us believe that these molecules have evolved by random muutlon of 
genes and natural selection for those gene products that have conferred some 

> functional advantage contributing to the survival of Individual organisms. 

,j Long before Darwin and Wallace proposed the theory of evolution and 
^ Mendel discovered the laws of genetics, plant and animal breeders had begun 
h to Interfere with the process of evolution In the speties that gave rise to 
r dQmestlcated animals and cultivated plants. Considering their total lack of 
knowledge of both evolutionary theory and genetia, their achievements, 
):* brought about by forcing the pace of and subverting natural selection, were 
r Impressive albeit very gradual. With the advent of molecular genetics and In 
particular techniques for gene cloning and gene Insertion, we arc now entering 
: an era of genetic exploitation of other organisms .undreamed of only 50 years 
' ago. We can now t>egir) to design genes to produce In other organisms novel 
gene products for the benefit of human beings; we are no longer restricted to 
. selecting uj^eful genes that arise by muutlon. We . are, however, only at the 

> beginning of this new era, and so far we have only scratched the surface of the 
knowledge that is required for true engineering and design of protein molecules. 

' We distinguish protein engineering, by which we mean mutating the gene 
of an existing protein In an attempt to alter Its function in a predictable wy, from 
' protein design, which has the m.ore ambitious goal of designing de novo a 
. protein to fulfill a desired function. 

Protein engineers frequently have been surprised by the range of effects 
caused by single mutatloru that they hoped would change only one specific and 
. simple property In enzyme; some exarriples are descrll>ed In Chapter IS. The \ 
often surprising results of such experiments reveal how Uttle we know at>out the 
rules of protein stability and the energetics of llgand binding and catalytic 
. efficiency; they also serve to emphasize how difficult it Is to design denovo$tab]e 
proteins with specific functions. However, by using the methods of engineering 
; and design, we are now at least inaeaslng rapidly our basic knowledge of the 
function of protein molecules. For example, we now know that the difference 
* in energetic terms between the trarislUon states of a naturally evolved useful 
erxzyme and an engineered useless mutant corresponds to less than the energy 
Qf a single hydrogen bond, even for such Important life-sustaining eiyzyims as 
the COj-flxlng enryme In green plants, rublsco (rlbulose-l,5-blsphosphate 
carboxylase/oxygenase). 

Knowledge of a protein's tertiary structure is a prerequisite for the proper 
engineering of its function. ITrifortunately, inspiteof recent slgnlQcant techno- 
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Reverse biochemistry: Use of macromolecular PJ^?^^^^"^ 
to dissect complex biological processes and identify a membrane- 
type serine protease in epitheUal cancer and normal tissue 

TOSHIHIKO TaKEUCHI*. MaRC a. SHUMANt, AND CHARUES S. CRAJK** 

.Dcp™u Of Phann^uUC Ccm^uy and Biochcn.i..y A Biophysics, and n>epart.en, of Medicine. Univcnity of Ca,ifon.l. San Fra«isco. CA 



ABSTRACT Serine proteases of the chymotrypsin fold 
are of great interest because they provide detailed under- 
standing of their enzymatic properties and their proposed r^e 
in a number of physiological and pathological processes. We 
have been developing the macromolecular inhibitor ccotin to 
be a "fold-specific" inhibitor that is selective for members of 
the chymotrypsin-fold dass of proteases. Inhibition of pro- 
tease activity through the use of wild-type and engineered 
ecotins results in inhibition of rat prostate differentiation and 
retardation of the growth of human PC-3 prostatic cancer 
tumors. In an effort to identify the proteases that may be 
involved in these processes, reverse transcription-PCR wiU) 
PC-3 poly(A)+ mRNA was performed by using degenerate 
oligonucleotide primers. These primers were designed by 
using conserved protein sequences unique to chymotrypsiii- 
fold serine proteases. Five proteases were identiried: urota- 
nase-type plasminogen activator, factor XH, protein 
trypsinogen IV, and a protease that we refer to as membrane, 
type serine protease 1 (MT-SPl). The cloning and character- 
ization of the MT-SPl cDNA shows that it encodes a mosaic 
protein that contains a transmembrane signal anchor, two 
CUB domains, four LDLR repeats, and a s«7'"«,P^S!^J' 
domain. Northern blotting shows broad expression of Ml -bPI 
in a variety of epithelial tissues with high levels of expression 
in the human gastrointestinal tract and the prostate. A 
His-lagged fusion of the MT-SPl protease domain was ex- 
pressed in Escherichia coii, purified, and au t cacti vated. Ecotin 
and variant ecotins are subnanomolar inhibitors of the M J- 
SPl activated protease domain, suggesting a possible role for 
MT-SPl in prostate differentiation and the growth of pros- 
tatic carcinomas. 

Serine proteases possessing a chymotrypsin fold are of great 
interest because they provide detailed understanding of their 
enzymatic properties and their proposed role in a number of 
physiological and pathological processes. A wealth of infor- 
mation exists on structure-function relationships regarding 
this large class of enzymes. Moreover, potent and specjlic 
inhibitors are readily available for use in dissecting the function 
of these enzymes. These proteases exist as precursors that are 
activated by specific and limited proteolysis, allowing regula- 
tion of enzyme activity (1). Examples of this type of regulation 
include blood coagulation (2), fibrinolysis (3), complement 
activation (4). and trypsinogen activation by enteropepiidase 
in digestion (5). The precise control of these activation pro- 
cesses is crucial for normal physiological enzymatic function; 
misrcgulaiion of these enzymes can lead to pathological con- 
ditions (2-5). , , . , 

We are interested in studying the role of these chymotryp; 
sin-fold serine proteases in cancer by using a "fold-specitic 

PNAS is available online ai www.pnas.org. 



inhibitor, ecotin (6, 7). Ecotin or engineered versions of ecotin 
can be introduced into complex biological systems as probes of 
proteolysis by these chymotrypsin-fold proteases. If effects are 
observed on treatment with these unique inhibitors, then the 
larce body of knowledge concerning the biochemistry of these 
proteases can be tapped to understand the structure and 
function of the target proteases. For example, the molecular 
cloning, structural modeling, and mechanistic understanding 
of the enzymes are immediately accessible. We refer to this 
approach, which is analogous to "reverse genetics." as reverse 
biochemistry." and we have applied it to idenufication of 
specific serine proteases in prostate ^"^er. 

Urokinase-lype plasminogen activator (uPA) has been im- 
plicated in tumor-cell invasion and metastasis. Cancer-cell 
invasion into normal tissue can be facilitated by uPA through 
its activation of plasminogen, which degrades the basement 
membrane and extracellular matrbc (reviewed in refs, 8 and 9). 
The role of other serine proteases in cancer has been less well 

characterized. , . . 

One useful model system for studying many issues that are 
pertinent to prostate cancer is the development of the rodent 
ventral prostate in cxplant cultures. Macromolecular inhibitors 
of serine proteases of the chymotrypsin fold, ecotin ^d ccotm 
M84R/M85R (6, 7). inhibit ductal branching morphogenesis 
and differentiation of the explanted rat ventral prostate (F. 
Elfman. T.T., C.C. G. Cunha. and M.S., unpublished data) 
Ecotin M84R/M85R is a 2,800-fold more potent inhibitor of 
uPA than ecotin (1 nM vs. 2.8 >.M) (6). However, inhibition of 
prostate differentiation was seen with both mhibitors. suggest- 
ing that uPA and other related serine proteases are involved m 
the differentiation and continued growth of the rat ventral 
prostate. Thus, unidentified serine proteases may play a role in 
growth and prevention of apoptosis in prostate epithelial cells 

in this system. . . j t^^^ 

Another well characterized model that is derived from 
human prostate cancer epitheUal cells is the PC-S cell line (10). 
The PC-3 cell line expresses uPA as assayed by EUSA and by 
Northern blotting of PC-S mRNA (11). We found that the 
primary tumor size in PC-3-im pi anted nude mice was signif- 
icantly smaller in both ecotin M84R/M85R and ecotm wild- 
type treated mice treated for 7 weeks compared with the 
primary tumor size of PBS-treated mice. Metastasis from the 
primary tumors were similarly lower in the inhibuor-treaied 

Abbreviations: MTSPl. mcmbrane-iypc serine Pjox^c J: 
implement factor iR-urchin embryonic growth f«tor-6onc morpho- 
gcnciic protein; LDLR. low density lipoprotein receptor; uPA uroki- 
nasc-typc plasminogen activator; pNA, p-nitroanilidc. 
D^ta dteposition: -Rie sequences reported in ^h.s paper hav^^^ 
dcpositcdin the GcnBank database (accession nos. Banklt257050 and 

r^^S' reprini requests should be addressed. E-mail: craik® 
cgl.ucsf.edu. 
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mice than in PBS-ireaied mice (O. Melnyk, T.T., C.C, and 
M.S.. unpublished data). Inhibition was not unexpected with 
ecotin M84R/M85R treatment, because uPA has been impli- 
cated in melaslasis. However, wild-lype ecolin is a poor, 
micromolar inhibitor of uPA; one interpretation of the data is 
that the decrease in tumor size and metastasis in the mouse 
model involves the inhibition of additional serine proteases. 
Thus, identification of the serine proteases expressed by PC-3 
prostate cells may provide insight into the role of these 
proteases in cancer and prostate growth and development. In 
this report we have extended the strategy of usmg PGR with 
degenerate oligonucleotide primers that were designed to' 
using conserved sequence homology (12-14) to identify addi- 
tional serine proteases made by cancer cells. Five mdependcnt 
serine protease cDNAs derived from PC-3 mRNA were se- 
quenced, including a novel serine protease, which we refer to 
as membrane-type serine protease 1 (MT-SPl), and the clon- 
ing and characterization of this cDNA that encodes a mosaic, 
transmembrane protease is reported. 

MATERIALS AND METHODS 

Materials. All primers used were synthesized on a Applied 
Biosystems 391 DNA synthesizer. All restriction enzymes were 
purchased from New England Biolabs. Automated PN A se- 
quencing was carried out on an Applied Biosystems 377 Prism 
sequencer, and manual DNA sequencing was carried out under 
standard conditions. N-terminal amino acid sequencing was 
performed on an ABI 477A by the University of California. 
San Francisco Biomolecular Resource Center. The synthetic 
substrates, Suc-AAPX-p-nitroanilide (pNA). [N-succinyl- 
alanyl-alanyl-prolyl-Xxx-pNA (Xxx = alanyl, asparty . glu- 
tamyl, phenylalanyl. leucinyl. meihionyl. or arginyl)]. and 
H-Arg-pNA. (arginyl-pNA), were purchased from Bachem. 
Deglycosylaiion was performed by usmg PNGase F (NEB, 
Beverly, MA). Al) other reagents were of the highest quality 
available and purchased from Sigma or Fisher unless otherwise 

"°uiation of cDNA from PC-3 Cells. mRNA >^as isolated 
from PC-3 cells by using the polyATtract System 1000 kit 
(Promega). Reverse transcription was primed by using the 
' lock-dScking" oligo(dT) primer (15). Supcrecripi 11 reverse 
transcriptase (Life Technologies. Grand Island. NY) was used 
in accordance with the manufacturer's instructions to synthe- 
size the cDNA from the PC-3 mRNA. 

Amplification of MT-SPl Gene. The degenerate primers 
used for amplifying the protease domains were designed from 
the consensus sequences flanking the catalytic hisiidme (5 
His-primer) and the catalytic serine (3' Ser-primcr), similar to 
those described (12). The 5' primer used is as ^^llow^ 5 -TGG 
(AG)Tl (CAG)TI (AT)(GC)I GCI (GA)CI CA(Cr) TG-3 . 
where nucleotides in parentheses represent equimolar mix- 
tures and I represents deoxyinosine. This pnmer cncodes^^ 
least the following amino acid sequence: AV (I/V) i^f^ /^y{J:p^) 
(Sn^ A (A/T) H C. The 3' primer used is as follows: 5 -lOO 
ICCJCC^CKAT) (AG)TC ICC (CT)TL (GA)CA IG(ATC) 
(G A)TC-3'. The reverse complement of the 3' P"")^^,^"5£.°" 
at least the following amino acid, sequence: D (A/b/1) C 
(K/E/Q/H) G D S G G P. ^ 

Direct amplification of serine protease cDNA was not 
possible by using the above primers. Instead, the first PGR was 
performed with the 5' His-primer and the ohgo(dT) primer 
described above, by using the "touchdown" PCR protocol (16). 
with annealing temperatures decreasing from 52 C to C 
over 22 rounds and 13 final rounds at 54"C annealing temper- 
ature. Cycle times were 1 min (denaturing). 1 mm (annealing), 
and 2 min (extension) and were followed by one final extension 
lime of 15 min after the final round of PGR. The template for 
the second PCR was 0.5 /tl (total reaction volume 50 ^lL) of 
a 110 dilution of the first PCR mixture that was performed 
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with the 5' His-primer and the oligo(dT). The second PCR 
reaction was primed with the 5' His- and the 3' Ser-primers and 
performed by using the touchdown protocol described above. 
All PCRs used 12.5 pmol of primer for 50-;jtl reaction volume. 

The product of the second reaction was purified on a 2% 
agarose gel, and all products between 400 and 550 bp were cut 
from the gel and extracted by using the QIAquick gel extrac- 
tion kit (Qiagen. Chatsworlh. CA). These products were 
digested with the BomHI restriction enzyme to cut any uPA 
cDNA. and all 400- to 500-bp fragments were rcpurified on a 
2% agarose gel. These reaction products were subjected to a 
third PCR by using the 5' His-primer and the 3' Scr-prlmer by 
using the identical touchdown procedure. These reaction 
products were gel-purified and directly cloned into the 
pPCR2 1 vector by using the TOPO TA ligation kit (Inviiro- 
gen) DNA sequencing of the inserts determined the cDNA 
sequence from nucleotides 1,984 to 2,460 (see Fig. 1). 

Northern Blot Analysis, ^^p.iabeled nucleotides were pur- 
chased from Amersham Pharmacia. A cDNA fragment con- 
taining nucleotides 1.173-2.510 was digested from expressed 
sequence lag w39209 by using restriction enzymes EcoRI and 
B5mbl. yielding a 1.3-kilobase nucleotide insert. Labeled 
cDNA probes were synthesized by using the Redipnme ran- 
dom primer labeling kit (Amersham Pharmacia) and 20 ng of 
the purified insert. Poly(A)+ RNA '"^'^J^^a""/^''^^?^^*}!^ 
blotting were purchased from Origene (Rockville, MD; HB- 
1002 HB-IOIS) and CLONTCCH (Human 11 7759-1. Human 
Cancier Cell Line 7757). The blots were performed under 
stringent annealing conditions as described in ref. 17. 

Construction of Expr^sion Vectors. The mature protease 
domain and a small portion of the pro-domain (nucleotides 
1 822-2.601) cDNA were amplified by usmg PCR from ex- 
pressed sequence tag w39209 and ligated mto the pQE30 
vector (Qiacen). This construct is designed to overexpress the 
pSe Suence from amino acids (-0 5%.855 with the 
foUowing fusion: Met-Arg-Gly.Ser-His.-aa596-855 The Hiv 
tag fusion allows affinity purification by using meial-chelaie 
chromatography. The change from Ser-805, encoded by TCC 
trAMGCT? was performed by using PCR. The presence of 
the correct Ser Ala substitution in the pQE30 vector was 
verified by DNA sequence analysis. ^ - -m. 

Expression and Purirication of the Protease Domain. The 
above-mentioned plasmids were separately transformed into 
Escherichia coli X-90 to afford high-level expression of recom- 
binant protease gene products (18). Expression and purifica- 
tion of the recombinant enzyme from solubilized mclusion 
bodies was performed as described (19). Protem containing 
fractions were pooled and dialyzed overnight at 4 C against 50 
mM Tris (pH 8). 10% glycerol, 1 mM 2-mercaptoeihanol, and 
3 M urea Auioaciivation of the protease was monitored on 
dialysis against storage buffer (50 mM Tris, pH 8/10% glyc- 
erol) at 4^C by using the substrate Specirozyme tPA (hexahy- 
drotyrosyl-Gly-Arg-pNA, American Diagnostica, Greenwich. 
CT): Hydrolysis of Spectrozyme tPA was monitored at 405 nM 
for the fonnation of p-nilroaniline by using a Uvikon 860 
spectrophotometer. Activated protease was bound to an im- 
mobilized p-aminobenzamidine resin (Pierce) that had been 
equilibrated with storage buffer. Bound protease was eluted 
with 100 mM benzamidine and the protein containing frac- 
tions were pooled. Excess benzamidine was removed by using 
FPLC with a Superdex 70 (Amersham Pharmacia) gel filtra- 
tion column that was equilibrated with storage buffer Protein 
containing fractions were pooled and stored at -80 C. The 
cleavaae of the purified Ser«»5Ala protease domain was per- 
formed at 37*'C by addition of active recombinant protease 
domain to 10 nM. Cleavage was monitored by using SDS/ 

^ISfermination of Substrate Kinetics. The purified serine 
protease domain was titrated with ^-meihylumbelliferyl p- 
guanidinobenzoate (MUGB) to obtain an accurate concen- 
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P.. ,. Nucleotide sequence of ,be cDNA encoding ^^^^^^'J;^;^^',,'::^^^ sCVyT^TundTrrnSp ^^^^^^^ 
acid residue. Amino acids are shown in s.ngle-letler code. The «™'"»«'°" ^0^" " ^ rfined at nucleotide 32. The predicted 

underlined. The catalytic triad in the serine protease domain .s highlighted: His-656. Asp-711, and Ser HU:). 



iration of enzyme active sites (20). Enzyme activity was 
monitored at 25**C in assay buffer containing 50 mM Tris (pH 
8.8). 50 mM NaCl, and 0.01% Tween 20. The final concen- 
tration of substrate Speclrozyme tPA ranged from 1 to 400 
PlM. Enzyme concentrations ranged from 40 to 800 pM. 
Active-site titrations were performed on a Fluoromax-2 spec- 
irofluorimeier. Measurements were plotted by using the 
KAUEiDAGRAPH program (Synergy Software, Reading, PA), 
and the /C„„ k^u and /Cca./Km for Spectrozyme tPA was 
determined by using the Michaelis-Menten equation. 

Inhibition of MT-SPl Protease Domain with Ecotin and 
Ecotin M84R/M85R. Ecotin and ecotin M84R/M85R were 
purified from £ coU as described (6). Various concentrations 
of ecotin or ecotin M84R/M85R were incubated with the 
His-tagged serine protease domain in a total volume of 990 
of buffer containing 50 mM NaCl. 50 mM Tris-HCI (pH 8.8). 
and 0.01% Tween 20. Ten microliters of Spectrozyme tPA was 
added, yielding a solution containing 100 p.M substrate. The 
final enzyme concentration was 63 pM. and the ecotin and 
ecotin M84R/M85R concentration ranged from 0.1 to 50 nM. 
The data were fit to the equation derived for kinetics of 
reversible tight-binding inhibitors (21. 22). and the values for 
apparent Ki were determined. 

RESULTS 

Cloning of Serine Protease Domain cDNAs from PC-3 Cells 
and AmpHfication of MT-SPl cDNA. PCR amplification of 
serine protease cDNA was performed by using "consensus 



cloning" where the amplification was performed with degen- 
erate primers designed to anneal to cDNA encoding the region 
about the conserved catalytic histidine (5' His-pnmer) and the 
conserved catalytic serine (3' Ser-primer). The consensus 
primers were designed by using 37 human sequences within a 
sequence alignment of 242 serine proteases of the chymotryp- 
sin fold that are reported in the SwissProt database. To bias the 
screen for previously unidentified proteases in the PC-3 
cDNA uPA cDNA was cut and removed by using the known 
BamHi endonuciease site in the uPA cDNA sequence. The 
expected size of the cDNA fragments amplified between 
HU-57 and Ser-195 cDNA (standard chymotrypsmogen num- 
bering) is between 400 and 550 bp; statistically, only 1 in 10 
cDNAs of that length will be cleaved by BamHl. Thus. cDN As 
obtained from the PCR reactions with the 5' His-pnmer and 
3' Ser-primer were size selected for the 400- lo 550-bp range, 
digested with BamHl. and purified from any digested cDNAs. 
After a subsequent round of PCR. the products were cloned 
into pPCR2.1 (Fig. 2). Twenty clones were digested with 
£coRI to monitor the size of the cDNA insert. Three clones 
lacked inserts of the correct size. The remaining 17 clones 
containing inserts between 400 and 550 bp were sequenced. 
BLAST searches of the resulting sequences revealed that six 
clones did not match serine protease sequences. The remainmg 
cDNAs yielded clones corresponding to factor XII (two 
clones), protein C (two clones), trypsinogen type IV (two 
clones). uPA (one clone), and MT-SPl (four clones). Addi- 
tional serine protease sequences may not have been found 
because they were digested by BamlU, lost in the size selection, 
or present in lower frequencies. 
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Fig. 2. 1-anc 1 shews the PGR products obtained by using degen- 
erate primcR designed from the consensus sequences flanking the 
catalytic histidinc (5' His-primer) and the catalytic serine (3' Ser- 
primcr). The products remaining between 400 and 550- bp after 
digestion with BamHl were rcamplified by using the same degenerate 
primers. The products from this second PGR arc shown in Lane 2. 

Multiple expressed sequence tag sequences were found for 
the cDNA. Expressed sequence tag accessions aa459076, 
aa219372. and w39209 were used extensively for sequencing 
the cDNA starting from nucleotide 746 and 2,461-3,142, but 
no start codon was observed. A sequence was also found in 
GenBank (accession no. U20428). This sequence also lacks the 
5' end of the cDNA but allowed amplification of cDNA from 
nucleotides 196-745. Rapid amplification of cDNA ends 
(RACE) (23) was used to obtain further 5' cDNA sequence. 
Application of RACE did not yield a clone containing the 
entire 5 '-untranslated region, but the sequence obtained con- 
tained a stop codon in-frame with the Kozak start sequence 
(24), giving confidence that the full coding sequence of the 
cDNA has been obtained. The nucleotide sequence and pre- 
dicted amino acid sequence are shown in Fig. 1. 

The nucleotide sequence surrounding the proposed start 
codon matches the optimal sequence of ACCATGG for 
translation initiation sites proposed by Kozak (24). In addition, 
there is a stop codon in-frame with the putative start codon, 
which gives further evidence that initiation occurs at that site. 
The DNA sequence predicts an 855-aa mosaic protein com- 
posed of multiple domains (Fig. 3). The coding sequence does 
not contain a typical signal peptide but does contain a smgle 
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Fig. 3. The domain structure of human MT-SPl is compared with 
the domain struaurc of cnieropcpiidase (47) and hepsin (25). SA. 
possible signal anchor; CUB. a repeat first identified in complement 
components Clr and Cls, the urchin embryonic growth factor and 
bone morphogeneiic protein 1 (27); U LDLR repeat (29); SP, a 
chymotrypsin family serine protease domain (40); MAM, a domain 
homologous to member? of a family defined by meprin. protein A5, 
and the protein tyrosine phosphatase ^ (48); MSCR. a macrophage 
scavenger receptor cysieinc-rich motif (29). The predicted disulfide 
linkages arc shown labeled as C-C 
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hydrophobic sequence of 26 residues (residues 55-81), which 
is flanked by a charged residue on each side. This sequence 
may constitute a signal anchor sequence, similar' to that 
observed in other proteases, including hepsin (25) and en- 
teropeptidase (26). Following the putative signal anchor se- 
quence are two complement factor IR-wrchin embryonic 
growth factor-i>one morphogeneiic protein (CUB) domains 
(27), which are named after the proteins in which the modules 
were first discovered: complement subcomponents Cls and 
Clr. urchin embryonic growth factor (Uegf). and bone mor- 
phogeneiic protein 1 (BMPl). CUB domains have conserved 
characteristics, which include the presence of four cysteine 
residues and various conserved hydrophobic and aromatic 
positions (27). The CUB domain, which has recently been 
characterized crystal lographically (28), consists of 10 ^-st^ands 
that are organized into two 5-stranded 3-sheeis. Following the 
CUB domains are four low-density lipoprotein receptor 
(LDLR) repeats (29), which are named after the receptor 
ligand-binding repeats that are present in the LDLR. These 
repeats have a highly conserved pattern and spacing of sue 
cysteine residues that form three intramolecular disulfide 
bonds. The final domain observed is the serine protease 
domain. The alignments of these domains with other members 
of their respective classes are shown in Fig. 4. 

Tissue Distribution of MT-SPl mRNA. Northern blots of 
human poly(A)+ RNA, made by using a 1 Jkilobase fragment 
of MT-SPl cDNA fragment as a probe, show a ~3.3-kilobase 
fragment appearing in epithelial tissues including the prostate, 
kidney, lung, small intestine, stomach, colon, and placenta, as 
well as other tissues, including spleen. liver, leukocytes, and 
thymus. This band was not observed in muscle, brain, ovary, or 
testis (Fig. 5). Similar experiments performed on a human 
cancer cell line blot shows that MT-SPl is expressed in the 
colorectal adenocarcinoma. SW480, but was not observed in 
the promyelocytic leukemia HI^60, HeLa cell S3, chronic 
myelogenous leukemia K-562, lymphoblastic leukemia 
MOLT-4. Burkitl's lymphoma Raji, lung carcinoma A549, or 
melanoma G361 lanes (data not shown). This 3.3-kilobase 
mRNA fragment is slightly longer than the 3.1-kilobase se- 
quence presented in Fig. 5, suggesting that there may still be 
sequence in the 5'-untranslated region that has not been 

identified. „ „ ^ 

Activation and Purification of His-MT-SPl Protease Do- 
main. The serine protease domain of MT-SPl was expressed 
in E. coli as a His-tagged fusion and was purified from inclusion 
bodies under denaturing conditions by using metal-chelaie 
affinity chromatography. The yield of enzyme after this step 
was «3 mg of protein per liter of E. coli culture. This denatured 
protein refolded when the urea was dialyzed from the protein. 
Surprisingly, the purified renatured protein showed a lime- 
dependent shift on an SDS/PAGE gel (Fig. 6/1), with the lower 
fragment being the size of the mature, processed enzyme 
lacking the His tag. N-tcrminal sequencing of the purified, 
activated protease domain yielded the expected WGGT 
activation sequence. When the refolded protein was tested for 
activity by using the synthetic substrate Spectrozyme tPA. a 
time-dependent increase in activity was observed (Fig. I" 
contrast, the protease domain that contams the Ser*^Ala 
mutation showed neither a change in size on an SDS poly- 
acrylamide gel nor an increase in enzymatic activity under 
identical conditions (data not shown), suggesting that the 
catalytic serine is necessary for activation and is not the result 
of a contaminating protease. To show that the cleavage of the 
protease domain was a result of His-tagged MT-SPl protease 
activity, the inactive Ser**»^Ala protease domain was treated 
with purified recombinant enzyme (Fig. 6C). This treatment 
results in the formation of a cleavage product that corresponds 
to the size of the active protease (Fig. 6C, lane 7). Untreated 
protease domain does not get cleaved (Fig. 6C. lane 8). From 
these results, it is concluded that the protease autoactivates on 
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refolding. The activated protease was separated from inactive 
protein and other contaminants by using affinity chromatog- 
raphy with D-aminobcnzamidine resin. Purified protein was 
analyzed by using SDS/PAGE. and no other contarn.nanis 
were observed. Similarly, immunoblouing with polyclonal 
antiserum against purified protease domain (raised in rabbits 
at Berkeley Antibody, Richmond. CA) revealed one band. 
Under nonreducing conditions, the pro region is disuinde- 
linked to the protease domain; thus, this purified protein was 



directed against the N-ierminal Arg-Gly-Ser-His. epitope that 
is contained in the recombinant protease domain further 
indicating the purity and identity of the protem (data not 

"^Kinetic Properties of PuriHed His-MT-SPl Protease Do- 
main. The enzyme concentration was determined by using an 
active site titration with MUGB. The catalytic activity of the 
protease domain was monitored by using pNA substrates. 
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F»0 5 Tissue disiribution of MT-SPl mRNA levels. Northern 
blots of human poly{A)+ RNA from assorted human tissues was 
hybridized with radiolabeled cDNA probes as descnbcd in Mfl/erwZr 
and Methods, Upper shows hybridization by usmg a MT-SPl l.> 
kilobase cDNA fragment derived from expressed sequence lag done 
W39209 and exposed ovcmighi. Lower shows ihe same blot after bcmg 
stripped and rehybridi2cd with a loading "^^^^^"^ > 
human gWccraldchydc phosphate dehydrogenase (G A PDH) W 
cDNA probe exposed for 2 hours. The mobiUiy of RNA size standards 
is indicated at the left 

Purified protease domain was tested for hydrolytic activity 
against ictrapeptide substrates of the form Suc-AAFX-pNA, 
which contained various amino acids ai the PI position {Vi- 
Ala, Asp, Glu, Phe, Leu, Met, Lys, or Arg). The only stabsirates 
with detectable activity were those with Pl-Lys or Pl-Aig. The 
serine protease domain with the Sci^^AIa mutation had no 
detectable activity. The activity of the protease domam was 
further characterized by using the substrate Specirozyme tPA, 

yielding: = 31.4 ± 4.2 ^, /cc,. = 2.6 X^10 .-.^^.^';'"^, 
kcJK^^ 6.9 X 10* ± 23 X 10* M'^-s'^ Ecotm inhibition of 
the MT-SPl His-tagged protease domain fits a tight-binding 
reversible inhibitory model (21, 22) as observed for ecotm 
interaction with other serine protease ^a^S^^^i^;, ^V^* 
Inhibition assays by using ecotin and ecotin M84R/M8^ 
yielded apparent K, values of 782 ± 92 pM and 9.8 :r 1.5 pM, 
respectively. 

DISCUSSION 

Structural Motifs of MT-SPl. In this work, we characterize 
the expression of chymotrypsin-fold Proteases by PC-3 cells 
and cloned a member of this family we call MT-SPl. The narnc 
membrane-type serine protease 1 (MT-SPl) is given to be 
consistent with the nomenclature of membrane-type m^^ 
trix metalloproteases (MT-MMPs; ref. 32). The cDN A likely 
encodes a membrane-type protein because of the lack of a 
signal sequence and the presence of a putative SA that is also 
seen in other membrane-type serine proteases hepsm (25). 
enteropeptidase (26), and TMPRSS2 (32). and human airway 
irypsin-like protease (33). We propose that protems that are 
localized to the membrane through a SA and that encode a 
chymoirypsin fold serine protease domain be categorized in 
the MT-SP family. The membrane localization of MT-SPl is 
supported by immunofluorescence experiments that localize 
the protease domain to the extracellular cell surface (unpub- 
lished results). , ^ . 

Following the putative SA arc several domains that are 
thoufihl to be involved in protein-protein interactions or 
protcin-Iigand interactions. For example. CUB domains can 
mediate protein-protein interactions as with the seminal 
plasma PSP-l/PSP-Il heierodimcr that is built by CUB- 
domain interactions (28) and with procollagen C-proteinase 
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enhancer protein and procollagen C-proteinase (BMP-1) (34, 
35). Interestingly, most of the proteins that contain CUB 
domains are involved in developmental processes or are in- 
volved in proteolytic cascades (27). which suggests that MT- 
SPl may play a similar role. The four repeated motifs that 
follow the CUB domains are known as LDLR ligand-binding 
repeals, named after the seven copies of repeats found in the 
LDLR. There are several negatively charged amino acids 
between the fourth and sixth cysteines that are highly con- 
served in the LDLR and are also seen in the LDLR repeats of 
MT-SPl. The conserved motif Ser-Asp-Glu (residues 44-46 in 
Fig. 4) are known to be important for binding the positively 
charged residues of the LDLR ligands apolipoprotein B-lOO 
(ApoB-100) and ApoE (29). The ligand-binding repeals of 
MT-SPl most likely do not mediate interaction with ApoB-100 
or ApoE but may be involved in the interaction with other 
positively charged ligands. For example, LJDLR repeats in the 
LDLR-related protein have been implicated the binding and 
recycling of proiease-inhibiior complexes such as uPA- 
plasminogcn activator inhibitor-1 (PAM) complexes (re- 
viewed in refs, 36 and 37). It also has been shown that the pro 
domain of enteropeptidase is involved in interactions with its 
substrate trypsinogen. allowing 520-fold greater catalytic ef- 
ficiency in the cleavage compared with the protease domain 
alone (38). By analogy, similar interactions should occur 
between MT-SPl and its substrates. Thus, further investigation 
of MT-SPl CUB domain or LDLR repeal interactions may 
yield insight into the function of this protein. 

The amino acid sequence of the serine protease domain of 
MT-SPl is highly homologous to other proteases found in the 
family (Fig 4). The essential features of a functional serme 
protease are contained in the deduced amino acid sequence of 
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Autoacthmtion of MT-SPl 
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FiG. 6. Activation and purification of His-tagged MTSPl protease 
domain A rcprescniauvc experiment is shown in A and B. {A) 
tMnJl^C was mon itorcd by using SDS/PAGE. The upper band 
represents inaciivaied protease domain, and the lower band represents 
active protease (also verified by N-terminal sequcnang). (5) Jhe 
activation of the protein was monitored by V^'"S Spearozyrnc^as 
a synthetic substmc for the protease domain, i^)}'^^^^''^^^^^^^^ 
protease domain is cleaved with 10 nM activated His-taagC;d MT-SP 
protease domain ai 3rC The specific cleavage of active MT-SPl 
protease domain is required for proper processing at the activation 
site. Aaivc protease domain is shown in lane 7 (+). and no cleavage 
of the untreated inactive protease domain is observed (lane S. 
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the domain. The residues that comprise the catalytic triad, 
His-656, Asp-711, and Ser-805, corresponding to His-57. Asp- 
102, and Ser.195 in chymotrypsin. are observed jn MT-SPl (for 
reviews, see refs. 39 and 40). The sequence Ser^"Trp^'^GIy 
(Ser^^Trp^^Gly^^), which is thought to interact with the side 
chains of the subslraie for properly orienting the scissile bond 
is present. Gly-193 (Gly-803) and Gly-196 (Gly-805). which arc 
thought lo be necessary for proper orientation of Ser-195 
(Ser-805). also are present. Based on homology to chymotryp- 
sin three disulfide bonds are predicted to form within the 
protease domain at Cys-44-Cys-58. Cys-168-Cys-182, and 
Cys-191-Cys.220 (Cys-643-Cys-657, Cys-776-Cys-790 and 
Cys-801-C^-830), and a fourth disulfide bond should form 
between the catalytic and the pro-domain Cys-122-Cys-l 
(Cys-731-Cys-604), as observed for chymotrypsin. This pre- 
dicted disulfide with the pro domain suggests that the active 
catalytic domain should still be localized to the cell surface via 
a disulfide linkage. The presence of the catalytic machmery 
and other conserved structural components described above 
suggest that all features necessary for proteolytic activity are 
present in the encoded sequence. 

Substrate Specificity of the MT-SPl Protease Domain. The 
S \ site specificity (41) of a protease is largely determined by the 
amino acid residue at position 189. This position is occupied by 
an aspartate in MT-SPl, suggesting that the protease has 
specificity for Arg/Lys in the PI position. In addition, the 
nrcNcnce of a polar Gln-192 (Gln-803), as in trypsin, is 
ciinsisicni with basic specificity. Furthermore, the presence of 
GIv 21 6 (Gly-827) and Gly-226 (Gly-837) is consistent with the 
presence of a deep SI pocket, unlike clastasc, which has 
Va|.2I6 and Thr-226 that block the pocket and thereby con- 
uibuic to the PI specificity for small hydrophobic side chains. 
The specificity al the other subsites is largely dependent on the 
n;iiurc of the seven loops A-E and loops 2 and 3 (Fig. 4). Loop 
C in enierokinase has a number of positively charged residues 
ihai are thought to interact with the negatively charged 
activation site in trypsinogcn, Asp-Asp- Asp- Asp-Lys (26). One 
known substrate for MT-SPl (as described below) is the 
activation site of MT-SPl, which is Arg-Gln-Ala-Arg (residues 
61 1-614). Loop C contains two Asp residues that may partic- 
ipate in the recognition of the aaivation sequence. 

One means of obtaining further data on substrate specificity 
is by characterization of the activity of the recombinant 
proteolytic domain, Enierokinase has been characterized from 
both recombinant (38, 42) and native (43. 44) sources. How- 
ever, proteolytic activity for the other reported membrane- 
type serine proteases hepsin (25) and TMPRSS2 (32) are only 
predicted based on sequence homology. To produce active 
recombinant MT-SPl, a His-tagged fusion of the protease 
domain was cloned into an £. coli vector and expressed and 
purified to homogeneity. Fortuitously, the protease domain 
refolded and auioaciivated after resuspension and purification 
from inclusion bodies. This activity, coupled with the lack of 
activity in the Ser'^AIa (Ser^^^Ala) variant, demonstrates that 
the cDNA encodes a caialytically proficient protease. Auto- 
activation of the protease domain at the arginine-valine site 
(Arg^'^-Val*^^) shows that the protease has Arg/Lys specificity 
as predicted by the sequence homology to other proteases of 
basic specificity. Specificity and selectivity are confirmed by 
the lack of cleavage of AAPX-pNA substrates that do not have 
X = R, K. Further characterization with Specirozyme iPA 
revealed an active enzyme with /Ccai = 2.6 x 10^ s"*. However, 
the His-tagged serine protease domain does not cleave H-Arg- 
pNA, showing that, unlike trypsin, there is a requirement for 
additional subsiie occupation for catalytic activity. This sug- 
gests that the enzyme is involved in a regulatory role that 
requires selective processing of particular substrates rather 
than nonselective degradation. 

MT-SPl Function. In other studies, we have found that 
inhibition of serine protease activity by ecotin or ecotin 
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M84R/M85R inhibits testosterone-induced branching ductal 
morphogenesis and enhances apopiosis in a rat ventral pros- 
tate model (F. Elfman. T.T.. C.S.C. G. Cunha. and M.A.S,. 
unpublished results). Moreover, the rat homolog of MT-SPl is 
expressed in the normal rat ventral prostate (data not shown). 
Assays of the protease domain with ecotin and ecoim M84R/ 
M85R showed that the enzymatic activity is strongly inhibited 
(-782 92 pM and 9.8 ± 1.5 pM, respectively), suggestmg that 
rat MT-SPl is likely to be inhibited al the concentrations of 
these inhibitors used in our experiments. MT-SPl inhibition 
may result in the observed inhibition of differentiation and/or 
increased apoptosis. Future studies are aimed at definitively 
resolving the role of MT-SPl in prostate differentiation. The 
broad expression of MT-SPl in epithelial tissues is consistent 
with the possibility that it is involved in cell maintenance or 
growth, perhaps by activating growth factors or by processing 
prohormones, 

MT-SPl may participate in a proteolytic cascade that results 
in cell growth and/or differentiation. Another structurally 
similar membrane-type serine protease, enterppeptidase (Fig. 
3) is involved in a proteolytic cascade by which activation of 
trypsinogen leads to activation of downstream intestinal pro- 
leases (5). Enteropeptidase is expressed only in the enterocytes 
of the proximal small intestine, thus precisely restricting 
activation of trypsinogen. Thus, in contrast to secreted pro- 
leases that may diffuse throughout the organism, the mem- 
brane association of MT-SPl should also allow the proteolytic 
activity to be precisely localized, which may be important for 
proper physiological function; improper localization of the 
enzyme, or levels of downstream substrates could lead to 
disease. 

We have found subcutaneous coinjection of PC-3 cells with 
wild-type ecotin or ecotin M84R/M85R led to a decrease in 
the primary tumor size compared with animals in whom PC-3 
cells and saline were injected (O. Melnyk, T.T., C.S.C. and, 
M A.S., unpublished results). Because wild-type ecotin is a 
poor micromolar inhibitor of uPA, serine proteases other than 
uPA likely are involved in this primary tumor proliferation. 
Both wild-type ecotin and ecotin M84R/M85R are potent, 
subnanomolar inhibitors of MT-SPl. raising the possibility that 
MT-SPl plays an important role in progression of epithelial 
cancers expressing this protease. 

Direct biochemical isolation of the substrates may be pos- 
sible if MT-SPl adhesive domains such as the CUB domams or 
LDLR repeats interact with the substrates. In addition, likely 
substrates may be predicted and tested for by using knowledge 
of extended enzyme specificity. For example, the character- 
ization of the substrate specificity of granzyme B allowed the 
prediction and confirmation of substrates for this serine pro- 
tease (45). Thus, these complimentary studies should further 
shed light on the physiological function of this enzyme. 
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1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



MGSDRARKGG GGPKDFGAGL KYNSRHEKVN GLEEGVEFLP VNNVKKVEKH 



GPGHWVVLAA VLIGLLLVLL GIGFLVW HLQ YRDVRVQKVF NGYMRITNEN 

KDALKLLYSG VPFLGPYHKE SAVTAFSEGS 



FVPAYENS tNS Tt CFVSLASKV 
VIAYYWSEFS IPQHLVEEAE 



RVMAEERVVM LPPRARSLKS FVVTSVVAFP 



TDSKTVQRTQ DNSCSFGLHA RGVELMRFTT PGFPDSPYPA HARCQWALRG 



DADSVLSLTF RSFDLASCDE 

y^ltJthssqn VLLITLITNT 

FNSPYYPGHY PPNIDCTWNI 

YVEINGEKYC GERSQFVVTS 



RGSDLVTVYN 
ERRHPGFEAT 
EVPNNQHVKV 
NSNKITVRFH 



TLSPMEPHAL VQLCGTYPPS 

FFQLPRMSSC GGRLRKAQGT 

* 

SFKFFYLLEP GVPAGTCPKD 



SDQSYTDTGF LAEYLSY 



DSS 



DP 



PGQFTCR TGRCIRKELR CDGWADCTDH (sDeJlNCSCDA GHQFTCKNKF 
CKPLFWVCDS VNDCGDN^DE] QGCSCPAQTF RCSNGKCLSK SQQCNGKDDC 
GDC feDEh SCP KVNVVTCTKH TYRCLNGLCL SKGNPECDGK EDCSDG{sDe}< 



DC 



DCGLRSFT RQAI^VGGTD ADEGEWPWQV SLHALGQGHI CGASLISPNW 
LVSA/^YID DRGFRYSDPT QWTAFLGLHD QSQRSAPGVQ ERRLKRIISH 
PFFNDFTFDY ©LALLELEKP AEYSSMVRPI CLPDASHVFP AGKAIWVTGW 
GHTQYGGTGA LILQKGEIRV INQTTCENLL PQQITPRMMC VGFLSGGVDS 
CQGC@ GGPLS SVEADGRIFQ AGVVSWGDGC AQRNKPGVYT RLPLFRDWIK 



ENTGV 



(SEQ. IDNO: 2) 



: Conserved cysteine residue 



NXTt; Possible N-linked glycosylation site 



SDEl : Conserved SDE motif 



Q : Potential cleavage site 

: Conserved amino acids of catalytic triad H, D, S 



1 

2 



1. Cytoplasmic domain 

2. Transmembrane domain 

3. CUB repeat 

4. Ligand-binding repeat (class A motif) 
of LDL receptor like domain 

5. Serine protease 
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TADG-15: AN EXTRACELLULAR SERINE 
PROTEASE OVEREXPRESSED IN BREAST 
AND OVARIAN CARCINOMAS 

BACKGROUND OF THE INVENTION 5 

1. Field of the Invention 

The present invention relates generally to the fields of 
cellular biology and the diagnosis of neoplastic disease. 
More specifically, the present invention relates to an extra- jq 
cellular serine protease termed T\imor Antigen Derived 
Gene -15 (TADG-15), which is overexpressed in breast and 
ovarian carcinomas. 

2. Description of the Related Art 

Extracellular proteases have been directly associated with ^5 
tumor growth, shedding of tumor cells and invasion of target 
organs. Individual classes of proteases are involved in, but 
not limited to (1) the digestion of stroma surrounding the 
initial tumor area, (2) the digestion of the cellular adhesion 
molecules to allow dissociation of tumor cells; and (3) the 20 
invasion of the basement membrane for metastatic growth 
and the activation of both tumor growth factors and angio- 
genic factors. 

The prior art is deficient in the lack of effective means of 
screening to identify proteases overexpressed in carcinoma. 
The present invention fulfills this longstanding need and 
desire in the art. 

SUMMARY OF THE INVEW ION 

30 

The present invention discloses a screening program to 
identify proteases overexpressed in carcinoma by examining 
PGR products amplified using differential display in early 
stage tumors, metastatic tumors compared to that of normal 
tissues. 35 

In one embodiment of the present invention, there is 
provided a DNA encoding a TADG-15 protein selected from 
the group consisting of: (a) isolated DNA which encodes a 
TADG-15 protein; (b) isolated DNA which hybridizes to 
isolated DNA of (a) above and which encodes a TADG-15 40 
protein; and (c) isolated DNA differing from the isolated 
DNAs of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG- 
15 protein. 

In another embodiment of the present invention, there is 45 
provided a vector capable of expressing the DNA of the 
present invention adapted for expression in a recombinant 
cell and regulatory elements necessary for expression of the 
DNA in the cell. 

In yet another embodiment of the present invention, there 
is provided a host cell transfected with the vector of the 
present invention, the vector expressing a TADG-15 protein. 

In still yet another embodiment of the present invention, 
there is provided a method of detecting expression of a 
TADG-15 mRNA, comprising the steps of: (a) contacting 
mRNA obtained from the cell with the labeled hybridization 
probe; and (b) detecting hybridization of the probe with the 
mRNA. 

Other and further aspects, features, and advantages of the gQ 
present invention will be apparent from the following 
description of the presently preferred embodiments of the 
invention given for the purpose of disclosure. 

BRIEF DESCRIPTION OF THE DRAWINGS 

65 

So that the matter in which the above-recited features, 
advantages and objects of the invention, as well as others 
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which will become clear, are attained and can be understood 
in detail, more particular descriptions of the invention 
briefly summarized above may be had by reference to 
certain embodiments thereof which are illustrated in the 
appended drawings. These drawings form a part of the 
specification. It is to be noted, however, that the appended 
drawings illustrate preferred embodiments of the invention 
and therefore are not to be considered limiting in their scope. 

FIG. 1 shows a comparison of PGR products derived from 
norma! and breast carcinoma cDNA as shown by staining in 
an agarose gel. 

FIG. 2 shows a comparison of the serine protease catalytic 
domain of TADG-15 with hepsin (Heps, SEQ ID No: 3), 
(Scce. SEQ ID No: 4), trypsin (Try, SEQ ID No: 5), 
chymotrypsin (Chymb, SEQ ID No: 6), factor 7 (Fac7, SEQ 
ID No: 7) and tissue plasminogen activator (Tpa, SEQ ID 
No: 8). The asterisks indicate conserved amino acids of 
catalytic triad. 

FIG. 3 shows quantitative PGR analysis of TADG-15 
expression. 

FIG. 4 shows the ratio of TADG-15 expression to expres- 
sion of p-tubulin in normal tissues, low mahgnant potential 
tumors (LMP) and carcinomas. 

FIG. 5 shows the TADG-15 expression in tumor cell lines 
derived from both ovarian and breast carcinoma tissues. 

FIG. 6 shows the overexpression of TADG-15 in other 
tumor tissues. 

FIG. 7 shows the Northern blots of TADG-15 expression 
in ovarian carcinomas, fetal and normal adult tissues. 

FIG. 8 shows a diagram of the TADG-15 transcript and 
the clones with the origin of their derivation. 

FIG. 9 shows nucleotide sequence of the TADG-15 cDNA 
(SEQ ID No: 1) and amino acid sequence of the TADG-15 
protein (SEQ ID No: 2) 

FIG. 10 shows the amino acid sequence of the TADG-15 
protease including functional sites and domains. 

FIG. 11 shows a structure diagram of the TADG-15 
protein including functional domains. 

FIG. 12 shows a nucleotide sequence comparison 
between TADG-15 and human SNC-19 (GeneBank acces- 
sion #U20428). 

DETAILED DESCRIPTION OF THE 
INVENTION 

As used herein, the term "cDNA" shall refer to the DNA 
copy of the mRNA transcript of a gene. 

As used herein, the term "derived amino acid sequence" 
shall mean the amino acid sequence determined by reading 
the triplet sequence of nucleotide bases in the cDNA. 

As used herein the term "screening a library" shall refer 
to the process of using a labeled probe to check whether, 
under the appropriate conditions, there is a sequence 
complementary to the probe present in a particular DNA 
library. In addition, "screening a library*' could be performed 
by PGR. 

As used herein, the term "PGR" refers to the polymerase 
chain reaction that is the subject of U.S. Pat. Nos. 4,683,195 
and 4,683,202 to Mullis, as well as other improvements now 
known in the art. 

The TADG-15 cDNA is 3147 base pairs long (SEQ ID 
No:l) and encoding for a 855 amino acid protein (SEQ ID 
No:2). The availabihty of the TADG-15 gene opens the way 
for a number studies that can lead to various applications. 
For example, the TADG-15 gene can be used as a diagnostic 
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or therapeutic target in ovarian carcinonoa and other carci- 
nomas including breast, prostate, lung and colon. 

In accordance with the present invention there may be 
employed conventional molecular biology, microbiology, 
and recombinant DNA techniques within the skill of the art. 
Such techniques are explained fully in the literature. See, 
e.g., Maniatis, Fritsch & Sambrook, "Molecular Cloning: A 
Laboratory Manual" (1982); "DNA Cloning: A Practical 
Approach," Volumes I and 11 (D. N. Glover ed. 1985); 
"Oligonucleotide Synthesis" (M. J. Gait ed. 1984); "Nucleic 
Acid Hybridization" [B. D, Hames & S. J. Higgins eds. 
(1985)]; "Transcription and Translation" [B. D. Hames & S. 
J. Higgins eds. (1984)]; "Animal CeU Culture" [R. I. 
Freshney, ed. (1986)]; "Immobilized Cells And Enzymes" 
[IRL Press, (1986)]; B. Perbal, "A Practical Guide To 
Molecular Cloning" (1984). 

Therefore, if appearing herein, the following terms shall 
have the definitions set out below. 

The amino acid described herein are preferred to be in the 
"L" isomeric form. However, residues in the "D" isomeric 
form can be substituted for any L-amino acid residue, as 
long as the desired functional property of immunoglobulin- 
binding is retained by the polypeptide. NH^ refers to the free 
amino group present at the amino terminus of a polypeptide. 
COOH refers to the free carboxy group present at the 
carboxy terminus of a polypeptide. In keeping with standard 
polypeptide nomenclature, / Biol. Chem,, 243:3552-59 
(1969), abbreviations for amino acid residues are shown in 
the following Table of Correspondence: 



TABLE OF CORRESPONDENCE 



SYMBOL 
1 -Letter 


3-LcUer 


AMINO ACID 


Y 


lyr 


tyrosine 


G 


Gly 


glycine 


F 


Phe 


Phenylalanine 


M 


Met 


methionine 


A 


Ala 


alanine 


S 


Ser 


serine 


I 


He 


isoleucine 


L 


Ixu 


leucine 


T 


Thr 


threonine 


V 


Val 


valine 


P 


Pro 


proline 


K 


Lys 


lysine 


H 


His 


histidine 


O 


Gin 


glutamine 


E 


Glu 


glutamic acid 


W 


Tip 


tryptophan 


R 


Arg 


arginine 


D 


Asp 


aspartic acid 


N 


Asn 


asparagine 


C 


Cys 


cysteine 



It should be noted that all amino-acid residue sequences 
are represented herein by formulae whose left and right 
orientation is in the conventional direction of amino- 
terminus to carboxy-terminus. Furthermore, it should be 
noted that a dash at the beginning or end of an amino acid 
residue sequence indicates a peptide bond to a further 
sequence of one or more amino-acid residues. The above 
Table is presented to correlate the three-letter and one-letter 
notations which may appear alternately herein. 

A "replicon" is any genetic element (e.g., plasmid, 
chromosome, virus) that functions as an autonomous unit of 
DNA replication in vivo; i.e., capable of replication under its 
own control. 

A "vector** is a replicon, such as plasmid, phage or 
cosmid, to which another DNA segment may be attached so 
as to bring about the replication of the attached segment. 
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A "DNA molecule" refers to the polymeric form of 
deoxyribonucleotides (adenine, guanine, thymine, or 
cytosine) in its either single stranded form, or a double- 
stranded helix. This term refers only to the primary and 
secondary structure of the molecule, and does not limit it to 
any particular tertiary forms. Thus, this term includes 
double-stranded DNA found, inter alia, in linear DNA 
molecules (e.g., restriction fragments), viruses, plasmids, 
and chromosomes. In discussing the structure herein accord- 
ing to the normal convention of giving only the sequence in 
the 5' to 3' direction along the nontranscribed strand of DNA 
(i.e., the strand having a sequence homologous to the 
mRNA). 

An "origin of replication" refers to those DNA sequences 
that participate in DNA synthesis. 

A DNA "coding sequence" is a double-stranded DNA 
sequence which is transcribed and translated into a polypep- 
tide in vivo when placed under the control of appropriate 
regulatory sequences. The boundaries of the coding 
sequence are determined by a start codon at the 5' (amino) 
terminus and a translation stop codon at the 3* (carboxyl) 
terminus. A coding sequence can include, but is not limited 
to, prokaryotic sequences, cDNA from eukaryotic mRNA, 
genomic DNA sequences from eukaryotic (e.g., 
mammalian) DNA, and even synthetic DNA sequences. A 
polyadenylation signal and transcription termination 
sequence will usually be located 3' to the coding sequence. 

Transcriptional and translational control sequences are 
DNA regulatory sequences, such as promoters, enhancers, 
polyadenylation signals, terminators, and the like, that pro- 
vide for the expression of a coding sequence in a host cell. 

A "promoter sequence" is a DNA regulatory region 
capable of binding RNA polymerase in a cell and initiating 
transcription of a downstream (3* direction) coding 
sequence. For purposes of defining the present invention, the 
promoter sequence is bounded at its 3' terminus by the 
transcription initiation site and extends upstream (5' 
direction) to include the minimum number of bases or 
elements necessary to initiate transcription at levels detect- 
able above background. Within the promoter sequence will 
be found a transcription initiation site, as well as protein 
binding domains (consensus sequences) responsible for the 
binding of RNA polymerase. Eukaryotic promoters often, 
but not always, contain "TATA" boxes and "CAT" boxes. 
Prokaryotic promoters contain Shine-Dalgamo sequences in 
addition to the -10 and -35 consensus sequences. 

An "expression control sequence" is a DNA sequence that 
controls and regulates the transcription and translation of 
another DNA sequence. A coding sequence is "under the 
control" of transcriptional and translational control 
sequences in a cell when RNA polymerase transcribes the 
coding sequence into mRNA, which is then translated into 
the protein encoded by the coding sequence, 

A "signal sequence" can be included near the coding 
sequence. This sequence encodes a signal peptide, 
N-terminal to the polypeptide, that communicates to the host 
cell to direct the polypeptide to the cell surface or secrete the 
polypeptide into the media, and this signal peptide is clipped 
off by the host cell before the protein leaves the cell. Signal 
sequences can be found associated with a variety of proteins 
native to prokaryotes and eukaryotes. 

The term "oligonucleotide", as used herein in referring to 
the probe of the present invention, is defined as a molecule 
comprised of two or more ribonucleotides, preferably more 
than three. Its exact size will depend upon many factors 
which, in turn, depend upon the ultimate hinction and use of 
the oligonucleotide. 
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The lerm "primer" as used herein refers to an 
oligonucleotide, whether occurring naturally as in a purified 
restriction digest or produced synthetically, which is capable 
of acting as a point of initiation of synthesis when placed 
under conditions in which synthesis of a primer extension 5 
product, which is complementary to a nucleic acid strand, is 
induced, i.e., in the presence of nucleotides and an inducing 
agent such as a DNA polymerase and at a suitable tempera- 
ture and pH. The primer may be either single -stranded or 
double-stranded and must be sufiBcienlly long to prime the 
synthesis of the desired extension product in the presence of 
the inducing agent. The exact length of the primer will 
depend upon many factors, including temperature, source of 
primer and use the method. For example, for diagnostic 
applications, depending on the complexity of the target 
sequence, the oligonucleotide primer typically contains 
15—25 or more nucleotides, although it may contain fewer 
nucleotides. 

The primers herein are selected to be "substantially" 
complementary to different strands of a particular target 
DNA sequence. This means that the primers must be suflB- 
cienlly complementary to hybridize with their respective 
strands. Therefore, the primer sequence need not reflect the 
exact sequence of the template. For example, a non- 
complementary nucleotide fragment may be attached to the ^5 
5' end of the primer, with the remainder of the primer 
sequence being complementary to the strand. Alternatively, 
non-complementary bases or longer sequences can be inter- 
spersed into the primer, provided that the primer sequence 
has suflBcient complementary with the sequence or hybridize 
therewith and thereby form the template for the synthesis of 
the extension product. 

As used herein, the terms "restriction endonucleases" and 
"restriction enzymes" refer to enzymes, each of which cut 
double-stranded DNA at or near a specific nucleotide 35 
sequence. 

A cell has been "transformed" by exogenous or heterolo- 
gous DNA when such DNA has been introduced inside the 
cell. The transforming DNA may or may not be integrated 
(covalently linked) into the genome of the cell. In 40 
prokaryotes, yeast, and mammalian cells for example, the 
transforming DNA may be maintained on an episomal 
element such as a plasmid. With respect to eukaryotic cells, 
a stably transformed cell is one in which the transforming 
DNA has become integrated into a chromosome so that it is 45 
inherited by daughter cells through chromosome replication. 
This stability is demonstrated by the abifity of the eukaryotic 
cell to establish cell lines or clones comprised of a popula- 
tion of daughter cells containing the transforming DNA. A 
"clone" is a population of cells derived from a single cell or 50 
ancestor by mitosis. A "cell line" is a clone of a primary cell 
that is capable of stable growth in vitro for many genera- 
tions. 

Two DNA sequences are "substantially homologous" 
when at least about 75% (preferably at least about 80%, and 55 
most preferably at least about 90% or 95%) of the nucle- 
otides match over the defined length of the DNA sequences. 
Sequences that are substantially homologous can be identi- 
fied by comparing the sequences using standard software 
available in sequence data banks, or in a Southern hybrid- eo 
ization experiment under, for example, stringent conditions 
as defined for that particular system. Defining appropriate 
hybridization conditions is within the skill of the art. See, 
e.g., Maniatis et al., supra; DNA Cloning, Vols, I & II, supra; 
Nucleic Acid Hybridization, supra. 65 

A "heterologous" region of the DNA construct is an 
identifiable segment of DNA within a larger DNA molecule 



that is not found in association with the larger molecule in 
natm"e. Thus, when the heterologous region encodes a mam- 
malian gene, the gene will usually be flanked by DNA that 
does not flank the mammalian genomic DNA in the genome 
of the source organism. In another example, coding 
sequence is a construct where the coding sequence itself is 
not found in nature (e.g., a cDNA where the genomic coding 
sequence contains introns, or synthetic sequences having 
codons different than the native gene). Allelic variations or 
naturally -occurring mutational events do not give rise to a 
heterologous region of DNA as defined herein. 

The labels most commonly employed for these studies are 
radioactive elements, enzymes, chemicals which fluoresce 
when exposed to ultraviolet light, and others. A number of 
fluorescent materials are known and can be utilized as labels. 
These include, for example, fluorescein, rhodamine, 
auramine, Texas Red, AMCA blue and Lucifer Yellow. A 
particular detecting material is anti-rabbit antibody prepared 
in goats and conjugated with fluorescein through an isothio- 
cyanate. 

Proteins can also be labeled with a radioactive element or 
with an enzyme. The radioactive label can be detected by 
any of the currently available counting procedures. The 
preferred isotope may be selected from H, C, P, S, 
^<^C1, ^^Cr, ^^Co, ^«Co, ^^Fe, ^Y, ^^^I, ^^M, and '^^Rc. 

Enzyme labels are likewise useful, and can be detected by 
any of the presently utilized color ira etric, 
spectrophotometric, fluorospectrophotometric, amperomet- 
ric or gasometric techniques. The enzyme is conjugated to 
the selected particle by reaction with bridging molecules 
such as carbodiimides, diisocyanates, glularaldehyde and 
the like. Many enzymes which can be used in these proce- 
dures are known and can be utilized. The preferred are 
peroxidase, p-glucuronidase, p-D-glucosidase, P-D- 
galactosidase. urease, glucose oxidase plus peroxidase and 
alkaline phosphatase. U.S. Pat. Nos. 3,654,090, 3,850,752, 
and 4,016,043 are referred to by way of example for their 
disclosure of alternate labeling material and methods. 

A particular assay system developed and utilized in the art 
is known as a receptor assay. In a receptor assay, the material 
to be assayed is appropriately labeled and then certain 
cellular test colonies are inoculated with a quantitiy of both 
the label after which binding studies are conducted to 
determine the extent to which the labeled material binds to 
the cell receptors. In this way, differences in affinity between 
materials can be ascertained. 

An assay useful in the art is known as a "cis/trans" assay. 
Briefly, this assay employs two genetic constructs, one of 
which is typically a plasmid that continually expresses a 
particular receptor of interest when transfected into an 
appropriate cell line, and the second of which is a plasmid 
that expresses a reporter such as luciferase, under the control 
of a receptor/ligand complex. Thus, for example, if it is 
desired to evaluate a compound as a hgand for a particular 
receptor, one of the plasmids would be a construct that 
results in expression of the receptor in the chosen cell line, 
while the second plasmid would possess a promoter linked 
to the luciferase gene in which the response element to the 
particular receptor is inserted. If the compound under test is 
an agonist for the receptor, the ligand will complex with the 
receptor, and the resulting complex will bind the response 
element and initiate transcription of the luciferase gene. The 
resulting chemiluminescence is then measured 
photometrically, and dose response curves are obtained and 
compared to those of known ligands. The foregoing protocol 
is described in detail in U.S. Pat. No. 4,981,784. 
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As used herein, the term "host" is meant to include not 
only prokaryotes but also eukaryotes such as yeast, plant and 
animal cells. A recombinant DNA molecule or gene which 
encodes a human TADG-15 protein of the present invention 
can be used to transform a host using any of the techniques 5 
commonly known to those of ordinary skill in the art. 
Especially preferred is the use of a vector containing coding 
sequences for the gene which encodes a human TADG-15 
protein of the present invention for purposes of prokaryote 
transformation. Prokaryotic hosts may include E. coli, 5. lO 
tymphimuriuin, Serratia marcescens and Bacillus subtilis. 
Eukaryotic hosts include yeasts such as Pichia pastoris, 
mammalian cells and insect cells. 

In general, expression vectors containing promoter 
sequences which facilitate the efiBcient transcription of the 15 
inserted DNA fragment are used in connection with the host. 
The expression vector typically contains an origin of 
replication, promoter(s), terminator(s), as well as specific 
genes which are capable of providing phenotypic selection 
in transformed cells. The transformed hosts can be fer- 20 
mented and cultured according to means known in the art to 
achieve optimal cell growth. 

The invention includes a substantially pure DNA encod- 
ing a TADG-15 protein, a strand of which DNA will 
hybridize at high stringency to a probe containing a 
sequence of at least 15 consecutive nucleotides of (SEQ ID 
NO: 1). The protein encoded by the DNA of this invention 
may share at least 80% sequence identity (preferably 85%, 
more preferably 90%, and most preferably 95%) with the 
amino acids listed in FIG. 10 (SEQ ID NO:2). More 
preferably, the DNA includes the coding sequence of the 
nucleotides of FIG. 9 (SEQ ID NO: 1), or a degenerate 
variant of such a sequence. 

The probe to which the DNA of the invention hybridizes 
preferably consists of a sequence of at least 20 consecutive 
nucleotides, more preferably 40 nucleotides, even more 
preferably 50 nucleotides, and most preferably 100 nucle- 
otides or more (up to 100%) of the coding sequence of the 
nucleotides listed in FIG. 9 (SEQ ID NO:l) or the comple- 
ment thereof. Such a probe is useful for detecting expression 
of TADG-15 in a human cell by a method including the steps 
of (a) contacting mRNA obtained from the cell with the 
labeled hybridization probe; and (b) detecting hybridization 
of the probe with the mRNA. 

This invention also includes a substantially pure DNA 
containing a sequence of at least 15 consecutive nucleotides 
(preferably 20, more preferably 30, even more preferably 50, 
and most preferably all) of the region from nucleotides 1 to 
3147 of the nucleotides listed in FIG. 9 (SEQ ID NOil). 50 

By "high stringency" is meant DNA hybridization and 
wash conditions characterized by high temperature and low 
salt concentration, e.g., wash conditions of 65** C. at a salt 
concentration of approximately O.lxSSC, or the functional 
equivalent thereof. For example, high stringency conditions 55 
may include hybridization at about 42° C. in the presence of 
about 50% formamide; a first wash at about 65** C. with 
about 2xSSC containing 1% SDS; followed by a second 
wash at about 65** C. with about O.lxSSC. 

By "substantially pure DNA" is meant DNA that is not 60 
part of a milieu in which the DNA naturally occurs, by virtue 
of separation (partial or total purification) of some or all of 
the molecules of that milieu, or by virtue of alteration of 
sequences that flank the claimed DNA. The term therefore 
includes, for example, a recombinant DNA which is incor- 65 
porated into a vector, into an autonomously replicating 
plasmid or virus, or into the genomic DNA of a prokaryote 



or eukaryote; or which exists as a separate molecule (e.g., a 
cDNA or a genomic or cDNA fragment produced by poly- 
merase chain reaction (PGR) or restriction endonuclease 
digestion) independent of other sequences. It also includes a 
recombinant DNA which is part of a hybrid gene encoding 
additional polypeptide sequence, e.g., a fusion protein. Also 
included is a recombinant DNA which includes a portion of 
the nucleotides listed in FIG. 9 (SEQ ID NO: 1) which 
encodes an alternative splice variant of TADG-15. 

The DNA may have at least about 70% sequence identity 
to the coding sequence of the nucleotides listed in FIG. 9 
(SEQ ID NO:l), preferably at least 75% (e.g. at least 80%); 
and most preferably at least 90%. The identity between two 
sequences is a direct function of the number of matching or 
identical positions. When a subunit position in both of the 
two sequences is occupied by the same monomeric subunit, 
e.g., if a given position is occupied by an adenine in each of 
two DNA molecules, then they are identical at that position. 
For example, if 7 positions in a sequence nucleotides in 
length are identical to the corresponding positions in a 
second 10-nucleotide sequence, then the two sequences have 
70% sequence identity. The length of comparison sequences 
will generally be at least 50 nucleotides, preferably at least 
60 nucleotides, more preferably at least 75 nucleotides, and 
most preferably 100 nucleotides. Sequence identity is typi- 
cally measured using sequence analysis software (e.g.. 
Sequence Analysis Software Package of the Genetics Com- 
puter Group, University of Wisconsin Biotechnology 
Center, 1710 University Avenue, Madison, Wis. 53705). 

The present invention comprises a vector comprising a 
DNA sequence which encodes a human TADG-15 protein 
and said vector is capable of replication in a host which 
comprises, in operable linkage: a) an origin of replication; b) 
a promoter; and c) a DNA sequence coding for said protein. 
Preferably, the vector of the present invention contains a 
portion of the DNA sequence shown in SEQ ID No:l. A 
"vector" may be defined as a replicable nucleic acid 
construct, e.g., a plasmid or viral nucleic acid. Vectors may 
be used to amplify and/or express nucleic acid encoding 
TADG-15 protein. An expression vector is a replicable 
construct in which a nucleic acid sequence encoding a 
polypeptide is operably linked to suitable control sequences 
capable of effecting expression of the polypeptide in a cell. 
TTie need for such control sequences will vary depending 
upon the cell selected and the transformation method cho- 
sen. 

Generally, control sequences include a transcriptional 
promoter and/or enhancer, suitable mRNA ribosomal bind- 
ing sites, and sequences which control the termination of 
transcription and translation. Methods which are well known 
to those skilled in the art can be used to construct expression 
vectors containing appropriate transcriptional and transla- 
tional control signals. See for example, the techniques 
described in Sambrook et al., 1989, Molecular Cloning: A 
Laboratory Manual (2nd Ed.), Cold Spring Harbor Press, 
N.Y. A gene and its transcription control sequences are 
defined as being "operably linked" if the transcription con- 
trol sequences effectively control the transcription of the 
gene. Vectors of the invention include, but are not limited to, 
plasmid vectors and viral vectors. Preferred viral vectors of 
the invention are those derived from retroviruses, 
adenovirus, ade no- associated virus, SV40 virus, or herpes 
viruses. 

By a "substantially pure protein" is meant a protein which 
has been separated from at least some of those components 
which naturally accompany it. Typically, the protein is 
substantially pure when it is at least 60%, by weight, free 
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from the proteins and other naturally-occurring organic 
molecules with which il is naturally associated in vivo. 
Preferably, the purity of the preparation is at least 75%, more 
preferably at least 90%, and most preferably at least 99%, by 
weight. A substantially pure TADG-15 protein may be 
obtained, for example, by extraction from a natural source; 
by expression of a recombinant nucleic acid encoding an 
TADG-15 polypeptide; or by chemically synthesizing the 
protein. Purity can be measured by any appropriate method, 
e.g., column chromatography such as immunoaflGnity chro- 
matography using an antibody specLGc for TADG-15, poly- 
acrylamide gel electrophoresis, or HPLC analysis. A protein 
is substantially free of naturally associated components 
when it is separated from at least some of those contami- 
nants which accompany it in its natural state. Thus, a protein 
which is chemically synthesized or produced in a cellular 
system different from the cell from which it naturally 
originates will be, by definition, substantially free from its 
naturally associated components. Accordingly, substantially 
pure proteins include eukaryotic proteins synthesized in E. 
coUj other prokaryotes, or any other organism in which they 
do not naturally occur. 

In addition to substantially full-length proteins, the inven- 
tion also includes fragments (e.g., antigenic fragments) of 
the TADG-15 protein (SEQ ID No:2). As used herein, 
"fragment," as applied to a polypeptide, will ordinarily be at 
least 10 residues, more typically at least 20 residues, and 
preferably at least 30 (e.g., 50) residues in length, but less 
than the entire, intact sequence. Fragments of the TADG-15 
protein can be generated by methods known to those skilled 
in the art, e.g., by enzymatic digestion of naturally occurring 
or recombinant TADG-15 protein, by recombinant DNA 
techniques using an expression vector that encodes a defined 
fragment of TADG-15, or by chemical synthesis. The ability 
of a candidate fragment to exhibit a characteristic of TADG- 
15 (e.g., binding to an antibody specific for TADG-15) can 
be assessed by methods described herein. Purified TADG-15 
or antigenic fragments of TADG-15 can be used to generate 
new antibodies or to test existing antibodies (e.g., as positive 
controls in a diagnostic assay) by employing standard pro- 
tocols known to those skilled in the art. Included in this 
invention are polyclonal antisera generated by using TADG- 
15 or a fragment of TADG-15 as the immunogen in, e.g., 
rabbits. Standard protocols for monoclonal and polyclonal 
antibody production known to those skilled in this art are 
employed. The monoclonal antibodies generated by this 
procedure can be screened for the ability to identify recom- 
binant TADG-15 cDNA clones, and to distinguish them 
from known cDNA clones. 

Further included in this invention are TADG-15 proteins 
which are encoded at least in part by portions of SEQ ID 
NO:2, e.g., products of alternative mRNA splicing or alter- 
native protein processing events, or in which a section of 
TADG-15 sequence has been deleted. The fragment, or the 
intact TADG-15 polypeptide, may be covalently linked to 
another polypeptide, e.g. which acts as a label, a ligand or a 
means to increase antigenicity. 

The invention also includes a polyclonal or monoclonal 
antibody which specifically binds to TADG-15. The inven- 
tion encompasses not only an intact monoclonal antibody, 
but also an immunologically-active antibody fragment, e.g., 
a Fab or (Fab)2 fragment; an engineered single chain Fv 
molecule; or a chimeric molecule, e.g., an antibody which 
contains the binding specificity of one antibody, e.g., of 
murine origin, and the remaining portions of another 
antibody, e.g., of human origin. 

In one embodiment, the antibody, or a fragment thereof, 
may be linked to a toxin or to a detectable label, e.g. a 
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radioactive label, non-radioactive isotopic label, fluorescent 
label, chemiluminescent label, paramagnetic label, enzyme 
label, or colorimetric label. Examples of suitable toxins 
include diphtheria toxin, Pseudomonas exotoxin A, ricin, 
and cholera toxin. Examples of suitable enzyme labels 
include malate hydrogenase, staphylococcal nuclease, delta- 
5-steroid isomerase, alcohol dehydrogenase, alpha-glycerol 
phosphate dehydrogenase, triose phosphate isomerase, 
peroxidase, alkaline phosphatase, asparaginase, glucose 
oxidase, beta-galactosidase, ribonuclease, urease, catalase, 
glucose-6-phosphate dehydrogenase, glucoamylase, 
acetylcholinesterase, etc. Examples of suitable radioisotopic 
labels include ^H, '"l, '^M, ^=^P, ^^S, ''^C, etc. 

Paramagnetic isotopes for purposes of in vivo diagnosis 
can also be used according to the methods of this invention. 
There are numerous examples of elements that are useful in 
magnetic resonance imaging. For discussions on in vivo 
nuclear magnetic resonance imaging, see, for example, 
Schaefer et al., (1989) JACC 14, 472-^0; Shreve et al., 
(1986) Magn, Reson, Med, 3, 336-340; Wolf, G. L., (1984) 
Physiol Chem. Phys. Med. NMR 16, 93-95; Wesbey et al., 
(1984) Physiol Chem. Phys. Med. NMR 16, 145-155; 
Runge et al., (1984) Invest. Radiol 19, 408^15. Examples 
of suitable fluorescent labels include a fluorescein label, an 
isothiocyalate label, a rhodamine label, a phycoerythrin 
label, a phycocyanin label, an allophycocyanin label, an 
ophthaldehyde label, a fluorescamine label, etc. Examples of 
chemiluminescent labels include a luminal label, an isolu- 
minal label, an aromatic acridinium ester label, an imidazole 
label, an acridinium salt label, an oxalate ester label, a 
luciferin label, a luciferase label, an aequorin label, etc. 

Those of ordinary skill in the art will know of other 
suitable labels which may be employed in accordance with 
the present invention. The binding of these labels to anti- 
bodies or fragments thereof can be accomplished using 
standard techniques commonly known to those of ordinary 
skill in the art. Typical techniques are described by Kennedy 
et al, (1976) Clin. Chim. Acta 70, 1-31; and Schurs et al., 
(1977) Clin. Chim. Acta 81, 1-40. Coupling techniques 
mentioned in the latter are the glutaraldehyde method, the 
periodate method, the dimaleimide method, the 
m-maleimidobenzyl-N-hydroxy-succinimide ester method. 
All of these methods are incorporated by reference herein. 

Also within the invention is a method of detecting TADG- 
15 protein in a biological sample, which includes the steps 
of contacting the sample with the labeled antibody, e.g., 
radioactively tagged antibody specific for TADG-15, and 
determining whether the antibody binds to a component of 
the sample. 

As described herein, the invention provides a number of 
diagnostic advantages and uses. For example, the TADG-15 
protein is useful in diagnosing cancer in diflferenl tissues 
since this protein is highly overexpressed in tumor cells. 
Antibodies (or antigen-binding fragments thereof) which 
bind to an epitope specific for TADG-15, are useful in a 
method of detecting TADG-15 protein in a biological sample 
for diagnosis of cancerous or neoplastic transformation. This 
method includes the steps of obtaining a biological sample 
(e.g., cells, blood, plasma, tissue, etc.) from a patient sus- 
pected of having cancer, contacting the sample with a 
labeled antibody (e.g., radioactively tagged antibody) spe- 
cific for TADG-15, and detecting the TADG-15 protein 
using standard immunoassay techniques such as an ELISA. 
Antibody binding to the biological sample indicates that the 
sample contains a component which specifically binds to an 
epitope within TADG-15. 

Likewise, a standard Northern blot assay can be used to 
ascertain the relative amounts of TADG-15 mRNA in a cell 
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or tissue obtained from a patient suspected of having cancer, 
in accordance with conventional Northern hybridization 
techniques known to those of ordinary skill in the art. This 
Northern assay uses a hybridization probe, e.g. radiolabelled 
TADG-15 cDNA, either containing the full-length, single 5 
stranded DNA having a sequence complementary to SEQ ID 
NO:l (FIG. 9), or a fragment of that DNA sequence at least 
20 (preferably at least 30, more preferably at least 50, and 
most preferably at least 100 consecutive nucleotides in 
length). The DNA hybridization probe can be labeled by any lo 
of the many different methods known lo those skilled in this 
art. 

Antibodies to the TADG-15 protein can be used in an 
immunoassay to delect increased levels of TADG-15 protein 
expression in tissues suspected of neoplastic transformation, 
These same uses can be achieved with Northern blot assays 
and analyses. 

The present invention is directed to DNA encoding a 
TADG-15 protein selected from the group consisting of: (a) 
isolated DNA which encodes a TADG-15 protein; (b) iso- 
lated DNA which hybridizes to isolated DNA of (a) above 
and which encodes a TADG-15 protein; and (c) isolated 
DNA differing from the isolated DNAs of (a) and (b) above 
in codon sequence due to the degeneracy of the genetic code, 
and which encodes a TADG-15 protein. Preferably, the DNA 
has the sequence shown in SEQ ID No:l. More preferably, 
the DNA encodes a TADG-15 protein having the amino acid 
sequence shown in SEQ ID No: 2. 

The present invention is also directed to a vector capable 
of expressing the DNA of the present invention adapted for 
expression in a recombinant cell and regulatory elements 
necessary for expression of the DNA in the cell. Preferably, 
the vector contains DNA encoding a TADG-15 protein 
having the amino acid sequence shown in SEQ ID No: 2. 

The present invention is also directed to a host cell 
transfected with the vector described herein, said vector 
expressing a TADG-15 protein. Representative host cells 
include consisting of bacterial cells, mammalian cells and 
insect cells. 40 

The present invention is also directed to a isolated and 
purified TADG-15 protein coded for by DNA selected from 
the group consisting of: (a) isolated DNA which encodes a 
TADG-15 protein; (b) isolated DNA which hybridizes to 
isolated DNA of (a) above and which encodes a TADG-15 45 
protein; and (c) isolated DNA differing from the isolated 
DNAs of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG- 
15 protein. Preferably, the isolated and purified TADG-15 
protein of claim 9 having the amino acid sequence shown in so 
SEQ ID No:2. 

The present invention is also directed to a method of 
detecting expression of the protein of claim 1, comprising 
the steps of: (a) contacting mRNA obtained from the cell 
with the labeled hybridization probe; and (b) detecting 55 
hybridization of the probe with the mRNA. 

The following examples are given for the purpose of 
illustrating various embodiments of the invention and are 
not meant to limit the present invention in any fashion. 

60 

EXAMPLE 1 
Tissue collection and storage 

Upon patient hysterectomy, bilateral 
salpingooophorectomy, or surgical removal of neoplastic 
tissue, the specimen is retrieved and placed it on ice. The 65 
specimen was then taken to the resident pathologist for 
isolation and identification of specific tissue samples. 



Finally, the sample was frozen in liquid nitrogen, logged into 
the laboratory record and stored at -80** C. Additional 
specimens were frequently obtained from the Cooperative 
Human Tissue Network (CHTN). These samples were pre- 
pared by the CHTN and shipped to us on dry ice. Upon 
arrival, these specimens were logged into the laboratory 
record and stored at -80° C. 

EXAMPLE 2 
mRNA isolation and cDNA synthesis 

Forty-one ovarian tumors (10 low malignant potential 
tumors and 31 carcinomas) and 10 normal ovaries were 
obtained from surgical specimens and frozen in liquid nitro- 
gen. The human ovarian carcinoma cell lines SW 626 and 
Caov 3, the human breast carcinoma cell lines MDA-MB- 
231 and MDA-MB-435S, and the human uterine cervical 
carcinoma cell line Hela were purchased from the American 
Type Culture Collection (Rockville, Md.). Cells were cul- 
tured to subconfluency in Dulbecco*s modified Eagle's 
medium, suspended with 10% (v/v) fetal bovine serum and 
antibiotics. 

Messenger RNA (mRNA) isolation was performed 
according to the manufacturer's instructions using the Mini 
RiboSep'^'*^ Ultra mRNA isolation kit purchased from Bec- 
ton Dickinson (cat. #30034). This was an oligo(dt) chroma- 
tography based system of mRNA isolation. The amount of 
mRNA recovered was quantitated by UV spectrophotom- 
etry. 

First strand complementary DNA (cDNA) was synthe- 
sized using 5.0 mg of mRNA and either random hexamer or 
oligo(dT) primers according to the manufacturer's protocol 
utilizing a first strand synthesis kit obtained from Clontech 
(cat.# K1402-1). The purity of the cDNA was evaluated by 
PCR using primers specific for the p53 gene. These primers 
span an intron such that pure cDNA can be distinguished 
from cDNA that is contaminated with genomic DNA. 

EXAMPLE 3 

PCR reactions 

The mRNA overexpression of TADG-15 was determined 
using a quantitative PCR. Oligonucleotide primers were 
used for: TADG-15, forward 

5'-ATGACAGAGGATTCAGGTAC-3' and reverse 
5'-GAAGGTGAAGTCATTGAAGA-3'; and P-tubuHn, for- 
ward 5'-TGCATTGACAACGAGGC-3' and reverse 
5'-CTGTCTTGACATTGTTG-3'. P-tubulin was utihzed as 
an internal control. Reactions were carried out as follows: 
first strand cDNA generated from 50 ng of mRNA will be 
used as template in the presence of 1.0 mM MgCl2, 0,2 mM 
dNTPs, 0.025 U Taq polymerase/ml of reaction, and 
Ixbuffer supplied with enzyme. In addition, primers must be 
added to the PCR reaction. Degenerate primers which may 
amplify a variety of cDNAs are used at a final concentration 
of 2.0 mM each, whereas primers which amplify specific 
cDNAs are added to a final concentration of 0.2 mM each. 

After initial denaiuration at 95° C. for 3 minutes, thirty 
cycles of PCR are carried out in a Perkin Elmer Gene Amp 
2400 thermal cycler. Each cycle consists of 30 seconds of 
denaturation at 95** C, 30 seconds of primer anneahng at the 
appropriate annealing temperature, and 30 seconds of exten- 
sion at 72** C. The final cycle will be extended at 72** C. for 
7 minutes. To ensure that the reaction succeeded, a fraction 
of the mixture will be electrophoresed through a 2% 
agarose/TAE gel stained with ethidium bromide (final con- 
centration 1 mg/ml). The annealing temperature varies 
according to the primers that are used in the PCR reaction. 
For the reactions involving degenerate primers, an annealing 
temperature of 48" C. were used. TTie appropriate annealing 
temperature for the TADG-15 and p-tubulin specific primers 
is 62** C. 
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EXAMPLE 4 
T-vector ligation and transformations 

The purified PGR products are ligated into the Promega 
T-vector plasmid and the ligation products are used to 
transform JM109 competent cells according to the manu- 5 
facturer's instructions (Promega cat. #A3610). Positive 
colonies were cultured for amplification, the plasmid DNA 
isolated by means of the Wizard™ Minipreps DNA purifi- 
cation system (Promega cat #A7500), and the plasmids were 
digested with Apal and Sad restriction enzymes to deter- 10 
mine the size of the insert. Plasmids with inserts of the 
size(s) visualized by the previously described PGR product 
gel electrophoresis were sequenced. 

EXAMPLE 5 15 

DNA sequencing 

Utilizing a plasmid specific primer near the cloning site, 
sequencing reactions were carried out using PRISM''"" 
Ready Reaction Dye Deoxy™ terminators (Applied Biosys- 
tems cat# 401384) according to the manufacturer's instruc- 20 
tions. Residual dye terminators were removed from the 
completed sequencing reaction using a Gentri-sep™ spin 
column (Princeton Separation cat,#GS-901). An Applied 
Biosystems Model 373A DNA Sequencing System was 
available and was used for sequence analysis. Based upon 25 
the determined sequence, primers that specifically amplify 
the gene of interest were designed and synthesized. 

EXAMPLE 6 

Northern blot analysis 30 

10 /ig mRNAs were size separated by electrophoresis 
through a 1% formaldehyde-agarose gel in 0.02 M MOPS, 
0.05 M sodium acetate (pH 7.0), and 0.001 M EDTA The 
mRNAs were then blotted to Hybond-N (Amersham) by 
capillary action in 20xSSPE. The RNAs are fixed to the 35 
membrane by baking for 2 hours at 80° G. 

Additional multiple tissue northern (MTN) blots were 
purchased from GLONTECH Laboratories, Inc. These blots 
include the Human MTN blot (cat.#7760-l), the Human 
MTN II blot (cat.#7759-l), the Human Fetal MTN II blot 40 
(cat.#7756-l), and the Human Brain MTN III blot 
(cat. #7750-1). The appropriate probes were radiolabelled 
utilizing the Prime-a-Gene Labeling System available from 
Promega (cat#U1100). The blots were probed and stripped 
according to the ExpressHyb Hybridization Solution pro to- 45 
col available from GLONTEGH (cat.#8015-l or 8015-2). 

EXAMPLE 7 

Quantitative PGR 

Quantitative-PGR was performed in a reaction mixture 
consisting of cDNA derived from 50 ng of mRNA, 5 pmol 
of sense and antisense primers for TADG-15 and the internal 
control p-tubulin, 0.2 mmol of dNTPs, 0.5 mGi of [a-^^P] 
dGTP, and 0.625 U of Taq polymerase in Ixbuffer in a final 
volume of 25 ml. TThis mixture was subjected to 1 minute of 
denaturation at 95° G. followed by 30 cycles of denaturation 
for 30 seconds at 95° G., 30 seconds of annealing at 62** G., 
and 1 minute of extension at 72° G, with an additional 7 
minutes of extension on the last cycle. The product was 
electrophoresed through a 2% agarose gel for separation, the 
gel was dried under vacuum and autoradiographed. The 
relative radioactivity of each band was determined by Phos- 
pholmager from Molecular Dynamics. 
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EXAMPLE 8 

The present invention describes the use of primers 
directed to conserved areas of the serine protease family to 
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identify members of that family which are overexpressed in 
carcinoma. Several genes were identified and cloned in other 
tissues, but not previously associated with ovarian carci- 
noma. The present invention describes a protease identified 
in ovarian carcinoma. This gene was identified using primers 
to the conserved area surrounding the catalytic domain of 
the conserved amino acid histidine and the downstream 
conserved amino acid serine which lies approximately 150 
amino acids towards the carboxyl end of the protease. 

The gene encoding the novel extracellular serine protease 
of the present invention was identified from a group of 
proteases overexpressed in carcinoma by subcloning and 
sequencing the appropriate PGR products. An example of 
such a PGR reaction is given in FIG. 1. Subcloning and 
sequencing of individual bands from such an amplification 
provided a basis for identifying the protease of the present 
invention. 

EXAMPLE 9 

The sequence determined for the catalytic domain of 
TADG-15 is presented in FIG. 2 and is consistent with other 
serine proteases and specifically contains conserved amino 
acids appropriate for the catalytic domain of the trypsin-like 
serine protease famQy. Specific primers (20mers) derived 
from this sequence were used. 

A series of normal and tumor cDNAs were examined to 
determine the expression of the TADG-15 gene in ovarian 
carcinoma. In a series of normal derived cDNA compared to 
carcinoma derived cDNA using p-tubulin as an internal 
control for PGR amplification, TADG-15 was significantly 
overexpressed in all of the carcinomas examined and either 
was not detected or was detected at a very low level in 
normal epithelial tissue (FIG. 3). This evaluation was 
extended to a standard panel of about 40 tumors. Using these 
specific primers, the expression of this gene was also exam- 
ined in tumor cell lines derived from both ovarian and breast 
carcinoma tissues as shown in FIG. 5 and in other tumor 
tissues as shown in FIG. 6. The expression of TADG-15 was 
also observed in carcinomas of the breast, colon, prostate 
and lung. 

Using the specific sequence for TADG-15 covering the 
full domain of the catalytic site as a probe for Northern blot 
analysis, three Northern blots were examined: one derived 
from ovarian tissues, both normal and carcinoma; one from 
fetal tissues; and one from adult normal tissues. As shown in 
FIG. 7, TADG-15 transcripts were noted in all ovarian 
carcinomas, but were not present in detectable levels in any 
of the following tissues: a) normal ovary, b) fetal liver and 
brain, c) adult spleen, thymus, testes, overy and peripheral 
blood lymphocytes, d) skeletal muscle, liver, brain or heart. 
The transcript size was found to be approximately 3.2 kb. 
The hybridization for the fetal and adult blots was appro- 
priate and done with the same probe as with the ovarian 
tissue. Subsequent to this examination, it was confirmed that 
these blots contained other detectable mRNA transcripts 

Initially using the catalytic domain of the protease to 
probe Hela cDNA and ovarian tumor cDNA libraries, one 
clone was obtained covering the entire 3' end of the TADG- 
15 gene from the ovarian tumor library. On fiirther screening 
using the 5' end of the newly detected clones, two more 
clones were identified covering the 5' end of the TADG-15 
gene from the Hela library (FIG. 8). The complete nucle- 
otide sequence (SEQ ID No: 1) is provided in FIG. 9 along 
with translation of the open reading frame (SEQ ID No: 2). 

In the nucleotide sequence, there is a Kozak sequence 
typical of sequences upstream from the initiation site of 
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iraoslalion. There is also a poly-adenylation signal sequence 
and a polyadenylated lail. The open reading frame consists 
of a 855 amino acid sequence (SEQ ID No: 2) which includes 
an amino terminal cytoplasmic tail from amino acids 1—50, 
an approximately 22 amino acid transmembrane domain 
followed by an extracellular sequence preceding two CUB 
repeats identified from complement subcomponents Clr and 
Cls. These two repeats are followed by fotu" repeat domains 
of a class A motif of the LDL receptor and these four repeats 
are followed by the protease enzyme of the trypsin family 
constituting the carboxyl end of the TADG-15 protein (FIG. 
11). Also a clear delineation of the catalytic domain con- 
served histidine, aspartic acid, serine series along with a 
series of amino acids conserved in the serine protease family 
is indicated (FIG. 10). 

A search of GeneBank for similar previously identified 
sequences yielded one such sequence with relatively high 
homology to a portion of the TADG-15 gene. The similarity 
between the portion of TADG-15 from nucleotide #182 to 
3139 and SNC-19 GeneBank accession #U20428) is 
approximately 97% (FIG, 12). TTiere are however significant 
differences between SNC-19 and TADG-15 viz. TADG-15 
has an open reading frame of 855 amino acids whereas the 
longest ORF of SNC-19 is only 173 amino acids. SNC-19 
does not include a proper start site for the initiation of 
translation nor does it include the amino terminal portion of 
the protein encoded by TADG-15. Moreover, SNC-19 does 
not include an ORF for a functional serine protease because 
the His, Asp and Ser residues necessary for function are 
encoded in different reading frames. 

TADG-15 is a highly overexpressed gene in tumors. It is 
expressed in a limited number of normal tissues, primarily 
tissues that are involved in either uptake or secretion of 
molecules e.g. colon and pancreas. TADG-15 is further 
novel in its component structure of domains in that it has a 
protease catalytic domain which could be released and used 
as a diagnostic and which has the potential for a target for 
therapeutic intervention. TADG-15 also has ligand binding 
domains which are commonly associated with molecules 
that internalize or take-up ligands from the external surface 
of the cell as does the LDL receptor for the LDL cholesterol 
complex. There is potential that these domains may be 
involved in uptake of specific ligands and they may offer the 
potential for making delivery of toxic molecules or genes to 
tumor cells which express this molecule on their surface. It 
has features that are similar to the hepsin serine protease 
molecule in that it also has an amino-terminal transmem- 
brane domain with the proteolytic catalytic domain extended 
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into the extracellular matrix. The difference here is that 
TADG-15 includes these ligand binding repeat domains 
which the hepsin gene does not have. In addition to the use 
of this gene as a diagnostic or therapeutic target in ovarian 

^ carcinoma and other carcinomas including breast, prostate, 
lung and colon, its ligand -binding domains may be valuable 
in the uptake of specific molecules into tumor cells. Table 2 
shows the number of cases with overexpression of TADG15 

30 in normal ovaries and ovarian tumors. 

Any patents or publications mentioned in this specifica- 
tion are indicative of the levels of those skilled in the art to 
15 which the invention pertains. These patents and publications 
are herein incorporated by reference to the same extent as if 
each individual publication was specifically and individually 
indicated to be incorporated by reference. 

20 

One skilled in the art will readily appreciate that the 
present invention is well adapted to carry out the objects and 
obtain the ends and advantages mentioned, as well as those 
inherent therein. The present examples along with the 
methods, procedures, treatments, molecules, and specific 
compounds described herein are presently representative of 
preferred embodiments, are exemplary, and are not intended 
as limitations on the scope of the invention. Changes therein 
3Q and other uses will occur to those skilled in the art which are 
encompassed within the spirit of the invention as defined by 
the scope of the claims. 

TABLE 2 



Number of cases with overexpression of TADG15 
in normal ovaries and ovarian tumors. 







N 


overexpression of TADG15 


expression ratio" 




Noimal 


10 


0 (0%) 


0.182 ± 0.024 


40 


LMP 


10 


10 (100%) 


0.847 ± 0.419 




serous 


6 


6 (100%) 


0.862 ± 0.419 




mucinous 


4 


4 (100%) 


0.825 ± 0.483 




Carcinoma 


31 


31 (100%) 


0.771 * 0.380 




serous 


18 


18 (100%) 


0.779 * 0.332 




mucinous 


7 


7 (100%) 


0.907 * 0.584 


45 


endometrioid 


3 


3 (100%) 


0.502 ± 0.083 




clear cell 


3 


3 (100%) 


0.672 a: 0.077 



"The ratio of expression level of TADG15 to p-tubuHn (mean ± SD) 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS : 13 

<210> SEQ ID NO 1 

<211> LENGTH: 3147 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<220> FEATURE: 

<222> LOCATION: 23.. 2589 

<223> OTHER INFORMATION: cDNA sequence of TAEC-IS 
<4 00> SEQUENCE: 1 

-tcaagagcgg cctcggggta ccatggggag cgatcgggcc cgcaagggcg gagggggccc 60 
gaaggacttc ggcgcgggac tcaagtacao ctcccggcac gagaaagtga atggcttgga 120 



5,972,616 

17 18 

-continued 

ggaaggcgtg gagttcctgc cagtcaacao cgtcaagaag gtggaaaagc atggcccggg 180 

gcgctgggtg gtgctggcag ccgtgctgat cggcctcctc ttggtcttgc tggggatcgg 240 

cttcctggtg tggcatttgc agtaccggga cgtgcgtgtc cagaaggtct tcaatggcta 300 

catgaggat-c acaaatgaga attttgtgga tgcctacgag aactccaact ccactgagtt 360 

-tg-taagcctg gccagcaagg tgaaggacgc gctgoagctg ctgtacagcg gagtcccatt 420 

cctgggcccc taccacaagg agtcggctgt gacggccttc agcgagggca gcgtcatcgc 480 

ctactactgg tctgagttca gcatcccgca gcacctggtg gaggaggccg agcgcgtcat 54 0 

ggccgaggag cgcgtagtca tgctgccccc gcgggcgcgc tccctgaagt cctttgtggt 600 

cacct-cagtg gtggctttcc ccacggactc caaaacagta cagaggaccc aggacaacag 660 

ctgcagcttt ggcctgcacg cccgcggtgt ggagctgatg cgcttcacca cgcccggctt 720 

ccctgacagc ccctaccccg ctcatgcccg ctgccagtgg gccctgcggg gggacgccga 780 

ctcogtgctg agcctcacct tccgcagctt tgoccttgcg tcctgcgacg agcgcggcag 84 0 

cgacctggtg acggtgtaca acaccctgag ccccatggag ccccacgccc tggtgcagtt 900 

gtgtggcacc taccctccct cctacaacct gaccttccac tcctcccago acgtcctgct 960 

catcacactg ataaccaaca ctgagcggcg gcatcccggc tttgaggcca ccttcttcca 1020 

gctgcctagg atgagcagct gtggaggccg cttacgtaaa gcccagggga cattcaacag 1080 

cccctactac ccaggccact acccacccaa cattgactgc acatggaaca ttgaggtgcc 114 0 

caacaaccag catgtgaagg tgagcttcaa attcttctac ctgctggagc ccggcgtgcc 1200 

tgcgggcacc tgccccaagg actacgtgga gatcaatggg gagaaatact gcggagagag 1260 

gtcccagttc gtcgtcacca gcaacagcaa caaga'tcaca gttcgcttcc actcagatca 1320 

gtcctacacc gacaccggct tcttagctga atacctctcc tacgactcca g-bgaccca-tg 1380 

cccggggcag ttcacgtgcc gcacggggcg gtgtatccgg aaggagctgc gctgtgatgg 144 0 

ct:gggccgac tgcaccgacc acagcgatga gc'tcaact:gc agttgcgacg ccggccacca 1500 

gttcacgtgc aagaacaagt tctgcaagcc cctcttctgg gtctgcgaca gtgtgaacga 1560 

c-tgcggagac aacagcgacg agcagggg-tg cag-ttg-tccg gcccagacct tcagg-tgt^c 1620 

caatgggaag tgcc-tc-tcga aaagccagca gtgcaatggg aaggacgact gtggggacgg 1680 

gtccgacgag gcctcctgcc ccaagg-tgaa cgtcgtcact tgtaccaaac acacctaccg 174 0 

ctgcctcaat gggctctgct tgagcaaggg caaccctgag tgtgacggga aggaggacbg 1800 

tagcgacggc tcagatgaga aggactgcga ctgtgggctg cggtcattca cgagacaggc 1860 

tcgtgttgtt gggggcacgg atgcggatga gggcgagtgg ccctggcagg taagcctgca 1920 

tgctctgggc cagggccaca tctgcggtgc ttccctcatc tctcccaact ggctggtctc 1980 

tgccgcacac -bgctacatcg atigacagagg a-k-tcagg-tac tcagacccca cgcag-kggac 2040 

ggccttcctg ggcttgcacg accagagcca gcgcagcgcc cctggggtgc aggagcgcag 2100 

gc-bcaagcgc at:ca-bc-tccc accccttctt caatgacttc acct-tcgact: atigacatcgc 2160 

gc-tgctggag c^ggagaaac cggcagag^a cagctccatg g-bgcggccca tctgcctgcc 2220 

ggacgcct:cc catgtc-ttcc ctgccggcaa ggcca-tctgg gtcacgggct ggggacacac 2280 

ccagtatgga ggcactggcg cgctgatcct gcaaaagggt gagatccgcg tcatcaacca 234 0 

gaccacctgc gagaacctcc -tgccgcagca gatcacgccg cgca-tgatig't gcgtgggc^t 2400 

cctcagcggc ggcgtggact cctgccaggg tgat-tccggg ggacccctg-t ccagcgtgga 2460 

ggcggatggg cggatcttcc aggccggtgt gg-tgagctgg ggagacggct gcgctcagag 2520 



5,972,616 

19 20 

-continued 



gaacaagcca 


ggcgtgtaca 


caaggctccc 


tctgtttcgg 


gactggatca 


aagagaacac 


2580 


tggggtatag 


gggccggggc 


cacccaaatg 


•tgtacacctg 


cggggccacc 


catcgtccac 


2640 


cccagtgtgc 


acgcctgcag 


gctggagact 


ggaccgctga 


ctgcaccagc 


gcccccagaa 


2700 


catacactgt 


gaactcaatc 


tccagggctc 


caaatctgcc 


-tagaaaacct 


ctcgcttcct 


2760 


cagcctccaa 


agtggagctg 


ggagg^agaa 


ggggaggaca 


ctggtggttc 


tactgaccca 


2820 


actgggggca 


aaggtttgaa 


gacacagccb 


cccccgccag 


ccccaagctg 


ggccgaggcg 


2880 


cgtttgtgta 


tatctgcctc 


ccctgtctgt 


aaggagcagc 


gggaacggag 


cttcggagcc 


2940 


tcctcagtga 


aggtggtggg 


gctgccggat 


ctgggctgtg 


gggccc'b'tgg 


gccacgctct 


3000 


tgaggaagcc 


caggctcgga 


ggaccctgga 


aaacagacgg 


gtctgagact 


gaaattgttt 


3060 


taccagctcc 


cagggtggac 


ttcagtgtgt 


gtatttgtg-t 


aaa-tgggtiaa 


aacaatttat 


3120 


ttctttttaa 


aaaaaaaaaa 


aaaaaaa 








3147 



<210> SEQ ID NO 2 

<21X> LENGTH: 855 

<212> TYPE: PRT 

<213> ORGANISM: Homo Bapiens 

<220> FEATURE: 

<223> OTHER INFORMATION: Amino acid sequence of TADG-15 encoded by 

nucleotides 23 to 2589 of Sequence 1 

<4 00> SEQUENCE: 2 

Met Gly Ser Asp Arg Ala Arg Lys Gly Gly Gly Gly Pro Lys Asp 

5 10 15 

Phe Gly Ala Gly Leu Lys Tyr Asn Ser Arg His Glu Lys Val Aen 

20 25 30 

Gly Leu Glu Glu Gly Val Glu Phe Leu Pro Val Aen Asn Val Lys 

35 40 45 

Lys Val Glu Lys His Gly Pro Gly Arg Trp Val Val Leu Ala Ala 

50 55 60 

Val Leu lie Gly Leu Leu Leu Val Leu Leu Gly lie Gly Phe Leu 

65 70 75 

Val Trp His Leu Gin Tyr Arg Asp Val Arg Val Gin Lys Val Phe 

80 85 9 0 

Asn Gly Tyr Met Arg lie Thr Asn Glu Asn Phe Val Asp Ala Tyr 

95 100 105 

Glu Asn Ser Asn Ser Thr Glu Phe Val Ser Leu Ala Ser Lys Val 

110 115 120 

Lys Asp Ala Leu Lys Leu Leu Tyr Ser Gly Val Pro Phe Leu Gly 

125 130 135 

Pro Tyr His Lys Glu Ser Ala Val Thr Ala Phe Ser Glu Gly Ser 

140 145 150 

Val lie Ala Tyr Tyr Trp Ser Glu Phe Ser lie Pro Gin His Leu 

155 160 165 

Val Glu Glu Ala Glu Arg Val Met Ala Glu Glu Arg Val Val Met 

170 175 180 

Leu Pro Pro Arg Ala Arg Ser Leu Lys Ser Phe Val Val Thr Ser 

185 190 195 

Val Val Ala Phe Pro Thr Asp Ser Lys Thr Val Gin Arg Thr Gin 

200 205 210 

Asp Asn Ser Cys Ser Phe Gly Leu His Ala Arg Gly Val Glu Leu 

215 220 225 



Met Arg Phe Thr Thr Pro Gly Phe Pro Asp Ser Pro Tyr Pro Ala 

230 235 240 



5,972,616 

21 22 

-continued 



His Ala Arg Cys Gin Trp Ala Leu Arg Gly Asp Ala Asp Ser Val 

245 250 255 

Leu Ser Leu Thr Phe Arg Ser Phe Asp Leu Ala Ser Cys Asp Glu 

260 265 270 

Arg Gly Ser Asp Leu Val Thr Val Tyr Asn Thr Leu Ser Pro Met 

275 280 285 

Glu Pro His Ala Leu Val Gin Leu Cys Gly Thr Tyr Pro Pro Ser 

290 295 300 

Tyr Asn Leu Thr Phe His Ser Ser Gin Asn Val Leu Leu lie Thr 

305 310 315 

Leu lie Thr Asn Thr Glu Arg Arg His Pro Gly Phe Glu Ala Thr 

320 325 330 

Phe Phe Gin Leu Pro Arg Met Ser Ser Cys Gly Gly Arg Leu Arg 

335 340 345 

Lys Ala Gin Gly Thr Phe Asn Ser Pro Tyr Tyr Pro Gly His Tyr 

350 355 360 

Pro Pro Asn lie Asp Cys Thr Trp Asn lie Glu Val Pro Asn Asn 

365 370 375 

Gin His Val Lys Val Ser Phe Lys Phe Phe Tyr Leu Leu Glu Pro 

380 385 390 

Gly Val Pro Ala Gly Thr Cys Pro Lys Asp Tyr Val Glu He Asn 

395 400 405 

Gly Glu Lys Tyr Cys Gly Glu Arg Ser Gin Phe Val Val Thr Ser 

410 415 420 

Asn Ser Asn Lys He Thr Val Arg Phe His Ser Asp Gin Ser Tyr 

425 430 435 

Thr Asp Thr Gly Phe Leu Ala Glu Tyr Leu Ser Tyr Asp Ser Ser 

440 445 450 

Asp Pro Cys Pro Gly Gin Phe Thr Cys Arg Thr Gly Arg Cys He 

455 460 465 

Arg Lys Glu Leu Arg Cys Asp Gly Trp Ala Asp Cys Thr Asp His 

470 475 480 

Ser Asp Glu Leu Asn Cys Ser Cys Asp Ala Gly His Gin Phe Thr 

485 490 495 

Cys Lys Asn Lys Phe Cys Lys Pro Leu Phe Trp Val Cys Asp Ser 

500 505 510 

Val Asn Asp Cys Gly Asp Asn Ser Asp Glu Gin Gly Cys Ser Cys 

515 520 525 

Pro Ala Gin Thr Phe Arg Cys Ser Asn Gly Lys Cys Leu Ser Lys 

530 535 540 

Ser Gin Gin Cys Asn Gly Lys Asp Asp Cys Gly Asp Gly Ser Asp 

545 550 555 

Glu Ala Ser Cys Pro Lys Val Asn Val Val Thr Cys Thr Lys His 

560 565 570 

Thr Tyr Arg Cys Leu Asn Gly Leu Cys Leu Ser Lys Gly Asn Pro 

575 580 585 

Glu Cys Asp Gly Lys Glu Asp Cys Ser Asp Gly Ser Asp Glu Lys 

590 595 600 

Asp Cys Asp Cys Gly Leu Arg Ser Phe Thr Arg Gin Ala Arg Val 

605 610 615 

Val Gly Gly Thr Asp Ala Asp Glu Gly Glu Trp Pro Trp Gin Val 

620 625 630 



Ser Leu His Ala Leu Gly Gin Gly His lie Cys Gly Ala Ser Leu 



5,972,616 

23 24 

-continued 



635 640 645 

lie Ser Pro Asn Ttp Leu Val Ser Ala Alo His Cys Tyr lie Asp 

650 655 660 

Asp Arg Gly Phe Arg Tyr Ser Asp Pro Thr Gin Trp Thr Ala Phe 

665 670 675 

Leu Gly Leu His Asp Gin Ser Gin Arg Ser Ala Pro Gly Val Gin 

680 685 690 

Glu Arg Arg Lou Lys Arg lie lie Ser His Pro Phe Phe Asn Asp 

695 700 705 

Phe Thr Phe Asp Tyr Asp lie Ala Leu Leu Glu Leu Glu Lys Pro 

710 715 720 

Ala Glu Tyr Ser Ser Met Val Arg Pro He Cys Leu Pro Asp Ala 

725 730 735 

Ser His Val Phe Pro Ala Gly Lys Ala He Trp Val Thr Gly Trp 

740 745 750 

Gly His Thr Gin Tyr Gly Gly Thr Gly Ala Leu He Leu Gin Lys 

755 760 765 

Gly Glu He Arg Val He Asn Gin Thr Thr Cys Glu Asn Leu Leu 

770 775 780 

Pro Gin Gin He Thr Pro Arg Met Met Cys Val Gly Phe Leu Ser 

785 790 795 

Gly Gly Val Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Leu Ser 

800 805 810 

Ser Val Glu Ala Asp Gly Arg He Phe Gin Ala Gly Val Val Ser 

815 820 825 

Trp Gly Asp Gly Cys Ala Gin Arg Asn Lys Pro Gly Val Tyr Thr 

830 835 840 

Arg Leu Pro Leu Phe Arg Asp Trp He Lys Glu Asn Thr Gly Val 

845 850 855 



<210> SEQ ID NO 3 
<211> LENGTH; 256 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION; Serine protease catalytic domain of hepsin 

(Heps) homologous to similar domain in TADG-15 

<4 00> SEQUENCE; 3 

Arg He Val Gly Gly Arg Asp Thr Ser Leu Gly Arg Trp Pro Trp 

5 10 15 

Gin Val Ser Leu Arg Tyr Asp Gly Ala His Leu Cys Gly Gly Ser 

20 25 30 

Leu Leu Ser Gly Asp Trp Val Leu Thr Ala Ala His Cys Phe Pro 

35 40 45 

Glu Arg Asn Arg Val Leu Ser Arg Trp Arg Val Phe Ala Gly Ala 

50 55 60 

Val Ala Gin Ala Ser Pro His Gly Leu Gin Leu Gly Val Gin Ala 

65 70 75 

Val Val Tyr His Gly Gly Tyr Leu Pro Phe Arg Asp Pro Asn Ser 

80 85 90 

Glu Glu Asn Ser Asn Asp He Ala Leu Val His Leu Ser Ser Pro 

95 100 105 

Leu Pro Leu Thr Glu Tyr He Gin Pro Val Cys Leu Pro Ala Ala 

110 115 120 



Gly Gin Ala Leu Val Asp Gly Lys He Cys Thr Val Thr Gly Trp 



5,972,616 

25 26 

-continued 



125 130 135 

Gly Asn Thr Gin Tyr Tyr Gly Gin Gin Ala Gly Val Leu Gin Glu 

140 145 150 

Ala Arg Vol Pro lie lie Ser Asn Asp Val Cys ABn Gly Ala Asp 

155 160 165 

Phe Tyr Gly Asn Gin lie Lys Pro Lys Met Phe Cys Ala Gly Tyr 

170 175 180 

Pro Glu Gly Gly lie Asp Ala Cys Gin Gly Asp Ser Gly Gly Pro 

185 190 195 

Phe Val Cys Glu Asp Ser lie Ser Arg Thr Pro Arg Trp Arg Leu 

200 205 210 

Cys Gly lie Val Ser Trp Gly Thr Gly Cys Ala Leu Ala Gin Lys 

215 220 225 

Pro Gly Val Tyr Thr Lys Val Ser Asp Phe Arg Glu Trp lie Phe 

230 235 240 

Gin Ala lie Lys Thr His Ser Glu Ala Ser Gly Met Val Thr Gin 

245 250 255 

Leu 



<210> SEQ ID NO 4 
<:211> LENGTH: 225 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<2 20> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of Scce 
homologous to similar domain in TADG-15 

<4 00> SEQUENCE: 4 

Lys lie lie Asp Gly Ala Pro Cys Ala Arg Gly Ser His Pro Trp 

5 10 15 

Gin Val Ala Leu Leu Ser Gly Asn Gin Leu His Cys Gly Gly Val 

20 25 30 

Leu Val Asn Glu Arg Trp Val Leu Thr Ala Ala His Cys Lys Met 

35 40 45 

Asn Glu Tyr Thr Val His Leu Gly Ser Asp Thr Leu Gly Asp Arg 

50 55 60 

Arg Ala Gin Arg lie Lys Ala Ser Lys Ser Phe Arg His Pro Gly 

65 70 75 

Tyr Ser Thr Gin Thr His Val Asn Asp Leu Met Leu Val Lys Leu 

80 85 90 

Asn Ser Gin Ala Arg Leu Ser Ser Met Val Lys Lys Val Arg Leu 

95 100 105 

Pro Ser Arg Cys Glu Pro Pro Gly Thr Thr Cys Thr Val Ser Gly 

110 115 120 

Trp Gly Thr Thr Thr Ser Pro Asp Val Thr Phe Pro Ser Asp Leu 

125 130 135 

Met Cys Val Asp Val Lys Leu lie Ser Pro Gin Asp Cys Thr Lys 

140 145 150 

Val Tyr Lys Asp Leu Leu Glu Asn Ser Met Leu Cys Ala Gly lie 

155 160 165 

Pro Asp Ser Lys Lye Asn Ala Cys Asn Gly Asp Ser Gly Gly Pro 

170 175 180 

Leu Val Cys Arg Gly Thr Leu Gin Gly Leu Val Ser Trp Gly Thr 

185 190 195 



Phe Pro Cys Gly Gin Pro Asn Asp Pro Gly Val Tyr Thr Gin Val 

200 205 210 



5,972,616 

27 28 

-continued 



Cys Lys Phe Thr Lys Trp lie Asn Asp Thr Met Lye Lys His Arg 

215 220 225 



<210> SEQ ID NO 5 
<21l> LENGTH: 225 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of trypsin 
(Try> homologous to similar domain in TADG-15 

<4 00> SEQUENCE: 5 

Lys lie Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr 

5 10 15 

Gin Val Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu 

20 25 30 

lie Asn Glu Gin Trp Val Val Ser Ala Gly His Cys Tyr Lys Ser 

35 40 45 

Arg lie Gin Val Arg Leu Gly Glu His Asn He Glu Val Leu Glu 

50 55 60 

Gly Asn Glu Gin Phe He Asn Ala Ala Lys He He Arg His Pro 

65 70 75 

Gin Tyr Asp Arg Lys Thr Leu Asn Asn Asp He Met Leu He Lys 

80 85 90 

Leu Ser Ser Arg Ala Val He Asn Ala Arg Val Ser Thr He Ser 

95 XOO 105 

Leu Pro Thr Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu He Ser 

110 115 120 

Gly Trp Gly Asn Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp Glu 

125 130 135 

Leu Gin Cys Leu Asp Ala Pro Val Leu Ser Gin Ala Lys Cys Glu 

140 145 150 

Ala Ser Tyr Pro Gly Lys He Thr Ser Asn Met Phe Cys Vol Gly 

155 160 165 

Phe Leu Glu Gly Gly Lys Asp Sor Cys Gin Gly Asp Ser Gly Gly 

170 175 180 

Pro Val Val Cys Asn Gly Gin Leu Gin Gly Val Val Ser Trp Gly 

185 190 195 

Asp Gly Cys Ala Gin Lys Asn Lys Pro Gly Val Tyr Thr Lys Val 

200 205 210 

Tyr Asn Tyr Val Lys Trp He Lys Asn Thr He Ala Ala Asn Ser 

215 220 225 



<210> SEQ ID NO 6 
<211> LENGTH: 231 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of 

chymotrypein (Chymb) homologous to similar domain in TADG-15 

<4 00> SEQUENCE: 6 

Arg He Val Asn Gly Glu Asp Ala Val Pro Gly Ser Trp Pro Trp 

5 10 15 

Gin Val Ser Leu Gin Asp Lys Thr Gly Phe His Phe Cys Gly Gly 

20 25 30 



Ser Leu He Ser Glu Asp Trp Val Val Thr Ala Ala His Cys Gly 

35 40 45 



5,972,616 

29 30 

-continued 



Val Arg Thr Ser Asp Val Vol Val Ala Gly Glu Phe Asp Gin Gly 

50 55 60 

Ser Asp Glu Glu Asn lie Gin Val Leu Lys lie Ala Lys Val Phe 

65 70 75 

Lys Asn Pro Lys Phe Ser lie Leu Thr Val Asn Asn Asp lie Thr 

80 85 90 

Leu Leu Lys Leu Ala Thr Pro Ala Arg Phe Ser Gin Thr Val Ser 

95 100 105 

Ala Val Cys Leu Pro Ser Ala Asp Asp Asp Phe Pro Ala Gly Thr 

110 115 120 

Leu Cys Ala Thr Thr Gly Trp Gly Lys Thr Lys Tyr Asn Ala Asn 

125 130 135 

Lys Thr Pro Asp Lys Leu Gin Gin Ala Ala Leu Pro Leu Leu Ser 

140 145 150 

Asn Ala Glu Cys Lys Lys Ser Trp Gly Arg Arg lie Thr Asp Val 

155 160 165 

Met lie Cys Ala Gly Ala Ser Gly Val Ser Ser Cys Met Gly Asp 

170 175 180 

Ser Gly Gly Pro Leu Val Cys Gin Lys Asp Gly Ala Trp Thr Leu 

185 190 195 

Val Gly lie Val Ser Trp Gly Ser Asp Thr Cys Ser Thr Ser Ser 

200 205 210 

Pro Gly Val Tyr Ala Arg Val Thr Lys Leu lie Pro Trp Val Gin 

215 220 225 

Lys lie Leu Ala Ala Asn 

230 



<210> SEQ ID NO 7 
<211> LENGTH: 255 
<212> TYPE: PRT 
<213> ORGANISM: Unknovm 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of factor 7 
(Fac7) homologous to similar domain in TADG-15 

<4 00> SEQUENCE; 7 

Arg lie Val Gly Gly Lys Val Cys Pro Lys Gly Glu Cys Pro Trp 

5 10 15 

Gin Val Leu Leu Leu Val Asn Gly Ala Gin Leu Cys Gly Gly Thr 

20 25 30 

Leu lie Asn Thr lie Trp Val Val Ser Ala Ala His Cys Phe Asp 

35 40 45 

Lys lie Lys Asn Trp Arg Asn Leu lie Ala Val Leu Gly Glu His 

50 55 60 

Asp Leu Ser Glu His Asp Gly Asp Glu Gin Ser Arg Arg Val Ala 

65 70 75 

Gin Val He He Pro Ser Thr Tyr Val Pro Gly Thr Thr Asn His 

80 85 90 

Asp He Ala Leu Leu Arg Leu His Gin Pro Val Val Leu Thr Asp 

95 100 105 

His Val Val Pro Leu Cys Leu Pro Glu Arg Thr Phe Ser Glu Arg 

110 115 120 

Thr Leu Ala Phe Val Arg Phe Ser Leu Val Ser Gly Trp Gly Gin 

125 130 135 



Leu Leu Asp Arg Gly Ala Thr Ala Leu Glu Leu Met Val Leu Asn 

140 145 150 



5,972,616 

31 32 

-continued 



Val Pro Arg Leu Met Thr Gin Asp Cys Leu Gin Gin Ser Arg Lys 

155 160 165 

Val Gly Asp Ser Pro Asn lie Thr Glu Tyr Met Phe Cys Ala Gly 

170 175 180 

Tyr Ser Asp Gly Ser Lys Asp Ser Cys Lys Gly Asp Ser Gly Gly 

185 190 195 

Pro His Ala Thr His Tyr Arg Gly Thr Trp Tyr Leu Thr Gly lie 

200 205 210 

Val Ser Trp Gly Gin Gly Cys Ala Thr Val Gly His Phe Gly Val 

215 220 225 

Tyr Thr Arg Val Ser Gin Tyr lie Glu Trp Leu Gin Lys Leu Met 

230 235 240 

Arg Ser Glu Pro Arg Pro Gly Val Leu Leu Arg Ala Pro Phe Pro 

245 250 255 



<210> SEQ ID NO 8 
<211> LENGTH: 253 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of tissue 

plasminogen activator {Tpa> homologous to similar domain in 
TADG-15 

<4 00> SEQUENCE: 8 

Arg lie Lys Gly Gly Leu Phe Ala Asp lie Ala Ser His Pro Trp 

5 10 15 

Gin Ala Ala lie Phe Ala Lys His Arg Arg Ser Pro Gly Glu Arg 

20 25 30 

Phe Leu Cys Gly Gly lie Leu lie Ser Ser Cys Trp lie Leu Ser 

35 ' 40 45 

Ala Ala His Cys Phe Gin Glu Arg Phe Pro Pro His His Leu Thr 

50 55 60 

Val lie Leu Gly Arg Thr Tyr Arg Val Val Pro Gly Glu Glu Glu 

65 70 75 

Gin Lys Phe Glu Val Glu Lys Tyr lie Val His Lys Glu Phe Asp 

80 85 90 

Asp Asp Thr Tyr Asp Asn Asp lie Ala Leu Leu Gin Leu Lys Ser 

95 100 105 

Asp Ser Ser Arg Cys Ala Gin Glu Ser Ser Val Val Arg Thr Val 

110 115 120 

Cys Leu Pro Pro Ala Asp Leu Gin Leu Pro Asp Trp Thr Glu Cys 

125 130 135 

Glu Leu Ser Gly Tyr Gly Lys His Glu Ala Leu Ser Pro Phe Tyr 

140 145 150 

Ser Glu Arg Leu Lys Glu Ala His Val Arg Leu Tyr Pro Ser Ser 

155 160 165 

Arg Cys Thr Ser Gin His Leu Leu Asn Arg Thr Val Thr Asp Asn 

170 175 180 

Met Leu Cys Ala Gly Asp Thr Arg Ser Gly Gly Pro Gin Ala Asn 

185 190 195 

Leu His Asp Ala Cys Gin Gly Asp Ser Gly Gly Pro Leu Val Cys 

200 205 210 

Leu Asn Asp Gly Arg Met Thr Leu Val Gly lie lie Ser Trp Gly 

215 220 225 



Leu Gly Cys Gly Gin Lys Asp Val Pro Gly Val Tyr Thr Lys Val 

230 235 240 



5,972,616 

33 34 

-continued 



Thr Asn Tyr Leu Asp Trp lie Arg Asp Asn Met Arg Pro 

245 250 

<210> SEQ ID NO 9 

<21l> LENGTH: 2900 

<212> TYPE: DNA 

<2 1 3> ORGANISM : Homo sapiens 

<220> FEATURE; 

<223> OTHER INFORMATION: SNC19 mRNA sequence (U20 428) 
<4 00> SEQUENCE: 9 



cgctgggtgg 


tgctggcagc 


cgtgctgatc 


ggcctcctct 


tggtcttgct 


ggggatcggc 


60 


ttcctggtgt 


ggcatt-tgca 


gtaccgggac 


gtgcgtgtcc 


agaaggtctt 


caatggctac 


120 


atgaggatca 


csaatgagaa 


ttttgtggat 


gcctacgaga 


actccaact-c 


cactgagttt 


180 


gtaagcctgg 


ccagcaaggt 


gaaggacgcg 


ctgaagctgc 


tgtacagcgg 


agtcccattc 


240 


ctgggcccct 


accacaagga 


gtcggctgtg 


acggccttca 


gcgagggcag 


cgtcatcgcc 


300 


tactactggt 


ctgagttcag 


catcccgcag 


cacctggttg 


aggaggccga 


gcgcgtcatg 


360 


gccaggagcg 


cgtagt-catg 


ctgcccccgc 


gggcgcgctc 


cctgaagtcc 


tttgtggtca 


420 


cctcagtggt 


ggctttcccc 


acggactcca 


aaacag-taca 


gaggacccag 


gacaacagct 


480 


gcagctttgg 


cctgcacgcc 


gcgg-tgtgga 


gctgatgcgc 


ttcaccacgc 


cggcttccct 


540 


gacagcccct 


accccgctca 


tgcccgctgc 


cagtgggctg 


cggggacgcg 


acgcagtgct 


600 


gagctactcg 


agcbgactcg 


cagcttgact 


gcgcctcgac 


gagcgcggca 


gcgacctggt 


660 


gacgtgtaca 


acaccctgag 


ccccatggag 


ccccacgcct 


ggtgagtgtg 


tggcacctac 


720 


cctccctcct 


acaacctgac 


cttccactcc 


ctcccacgaa 


cgtcctgctc 


atcacactga 


780 


taaccaacac 


-tgacgcggca 


tcccggcttt 


gaggccacct 


tcttccagct 


gcctaggatg 


840 


agcagctg'tg 


gaggccgctt 


acgtaaagcc 


caggggacat 


tcaacagccc 


ctactaccca 


900 


ggccactacc 


cacccaacat 


tgactgcaca 


tggaaaattg 


agg-tgcccaa 


caaccagcat 


960 


gtgaaggtgc 


gcttcaaatt 


cttctacctg 


ctggagcccg 


gcgtgcctgc 


gggcacctgc 


1020 


cccaaggact 


acg'bggaga-b 


caa-tggggag 


aaatac-tgcg 


gagagaggtc 


ccagttcgtc 


1080 


gtcaccagca 


acagcaacaa 


gatcacagtt 


cgcttccact 


cagatcagtc 


ctacaccgac 


1140 


accggcttct 


tagctgaata 


cctctcctac 


gactccagtg 


acccatgccc 


ggggcagttc 


1200 


acgtgccgca 


cggggcggtg 


tatccggaag 


gagctgcgct 


gtgatggctg 


ggcgactgca 


1260 


ccgaccacag 


cgatgagctc 


aactgcagtt 


gcgacgccgg 


ccaccagttc 


acgtgcaaga 


1320 


gcaagttctg 


caagc-tcttc 


tgggtctgcg 


acagtgtgaa 


cgagtgcgga 


gacaacagcg 


1380 


acgagcaggg 


ttgcatttgt 


ccggacccag 


accttcaggt 


gttccaat-gg 


gaagtgcctc 


1440 


-tcgaaaagcc 


agcagtgcaa 


■tgggaaggac 


gactgtgggg 


acgggtccga 


cgaggcctcc 


1500 


tgccccaagg 


tgaacgtcgt 


cacttgtacc 


aaacacacct 


accgctgcct 


caatgggctc 


1560 


tgcttgagca 


agggcaaccc 


tgagtgtgac 


gggaaggagg 


actgtagcga 


cggctcagat 


1620 


gagaaggact 


gcgactg-tgg 


gctgcggtca 


ttcacgagac 


aggctcgtgt 


tgttgggggc 


1680 


acggatgcgg 


atgagggcga 


gtggccctgg 


caggtaagcc 


tgcatgctct 


gggccagggc 


1740 


cacatctgcg 


gtgcttccct 


catctctccc 


aactggctgg 


tctctgccgc 


acactgctac 


1800 


atcgatgaca 


gaggattcag 


gtactcagac 


cccacgcagg 


acggccttcc 


tgggcttgca 


1860 


cgaccagagc 


cagcgcaggc 


cctggggtgc 


aggagcgcag 


gctcaagcgc 


atcatctccc 


1920 


accccttctt 


caatgacttc 


accttcgact 


atgacatcgc 


gctgctggag 


c-tggagaaac 


1980 
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cggcagag-ta 


cagctccatg 


g-tgcggccca 


tctgcctgcc 


ggacgcctgc 


catgtcttcc 


2040 


ctgccggcaa 


ggccatctgg 


g^cacgggc-t 


ggggacacac 


ccagtatgga 


ggcactggcg 


2100 


cgctgatcct 


gcaaaagggt 


gagatccgcg 


tcat:caacca 


gaccacc-tgc 


gagaacctcc 


2160 


tgccgcagca 


gatcacgccg 


cgcatgatgt 


gcgtgggctt 


cctcagcggc 


ggcgtggact 


2220 


cctgccaggg 


-tgattccggg 


ggacccctgt 


ccagcgiigga 


ggcggatggg 


cggatcttcc 


2280 


aggccggtgt 


ggtgagctgg 


ggagacgctg 


cgc-tcagagg 


aacaagccag 


gcgtgtacac 


2340 


oaggctccct 


ctgtttcggg 


aatggatcaa 


agagaacact: 


ggggtiatiagg 


ggccggggcc 


2400 


acccaaa'tg-t 


g'tacacc'bgc 


ggggccaccc 


at-cgtccacc 


ccagtgtgca 


cgcctgcagg 


2460 


ctggagactc 


gcgcaccgtg 


acctgcacca 


gcgccccaga 


acatacactg 


tgaactcatc 


2520 


tccaggctca 


aatctgctag 


aaaacct:ct.c 


gcttcctcag 


cctccaaagt 


ggagctggga 


2580 


gggtagaagg 


ggaggaacac 


tggtggttct 


actgacccaa 


ctggggcaag 


gtttgaagca 


2640 


cagcticcggc 


agcccaag-tg 


ggcgaggacg 


cgtttgtgca 


tactgccctg 


c^ctat^acac 


2700 


ggaagacctg 


gatctctagt 


gagt:gtgact 


gccggatctg 


gctgtggtcc 


ttggccacgc 


2760 


tticttgagga 


agcccaggct 


cggaggaccc 


tggaaaacag 


acgggtctga 


gactgaaaat 


2820 


gg-tttaccag 


ctcccagg-tg 


acttcagtgt 


gtgtattgtg 


taaatgagta 


aaaca'ttt'ta 


2880 


tttcttttta 


aaaaaaaaaa 
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<210> SEQ ID NO 10 
<211> LENGTH: 20 
<212> TYPE; DNA 

<213> ORGANISM: Artificial Sequence 

<2 20> FEATURE; 

<221> NAME /KEY: prime r_bind 

<2 22> LOCATION: 1-2 0 

<223> OTHER INFORMATION: Forward primer for analysis of overexpression 
of TADG-15 mRNA by quantitative PGR. 

<4 00> SEQUENCE: 10 

atgacagagg attcaggtac 20 

<210> SEQ ID NO 11 

<211> LENGTH; 20 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<221> NAME/KEY: primer_bind 

<222> LOCATION: 1-20 

<2 23> OTHER INFORMATION: Reverse primer for analysis of overexpression 
of TADG-15 mRNA by quantitative PGR. 

<4 00> SEQUENCE; 11 

gaaggtgaag tcattgaaga 20 

<210> SEQ ID NO 12 

<211> LENGTH: 17 

<212> TYPE; DNA 

<213> ORGANISM: Artificial Sequence 

<2 20> FEATURE: 

<2 21> NAME /KEY: prime r_bind 

<2 22> LOCATION: 1-17 

<2 23> OTHER INFORMATION: Forward primer for analysis of B- tubulin mRNA 
expression by quantitative PCR . 

<4 00> SEQUENCE: 12 

tgcattgaca acgaggc 17 



<210> SEQ ID NO 13 
<211> LENGTH; 17 



37 



5,972,616 

-continued 



38 



<212> TYPE: DNA 

■<:213> ORGANISM: AirtOJ^cial Sequence 
<220> FEATURE: 
<221> NAME/KEY; primer_bind 
<:222> LOCATION: 1-17 

<223> OTHER INFORMATION: Forward primer for analysis of B- tubulin mRNA 
expression by quantitative PGR. 

<4 00> SEQUENCE: 13 

ctgtcttgac attgttg 17 



What is claimed is: 

1. DNA encoding a Tumor Antigen Derived Gene- 15 15 
(TADG-15) protein selected from the group consisting of: 

(a) isolated DNA which encodes a TADG-15 protein; 

(b) isolated DNA which hybridizes to isolated DNA of (a) 
above and which encodes a TADG-15 protein; and 

20 

(c) isolated DNA differing from the isolated DNAs of (a) 
and (b) above in codon sequence due to the degeneracy 
of the genetic code, and which encodes a TADG-15 
protein. 

2. The DNA of claim 1, wherein said DNA has the ^5 
sequence shown in SEQ ID No:l. 

3. The DNA of claim 1, wherein said TADG-15 protein 
has the amino acid sequence shown in SEQ ID No: 2. 

4. A vector comprising the DNA of claim 1 and regulatory 
elements necessary for expression of the DNA in a cell. 

5. The vector of claim 4, wherein said DNA encodes a 
TADG-15 protein having the amino acid sequence shown in 
SEQ ID No:2. 

6. A host cell transfected with the vector of claim 4, said 
vector expressing a TADG-15 protein. 

7. The host cell of claim 6, wherein said cell is selected 
from group consisting of bacterial cells, mammalian cells, 
plant cells and insect cells. 



8. The host cell of claim 7, wherein said bacterial cell is 
E. coli. 

9. Isolated and purified TADG-15 protein coded for by 
DNA selected from the group consisting of: 

(a) isolated DNA which encodes a TADG-15 protein; 

(b) isolated DNA which hybridizes to isolated DNA of (a) 
above and which encodes a TADG-15 protein; and 

(c) isolated DNA differing from the isolated DNAs of (a) 
and (b) above in codon sequence due to the degeneracy 
of the genetic code, and which encodes a TADG-15 
protein. 

10. The isolated and purified TADG-15 protein of claim 
9 having the amino acid sequence shown in SEQ ID No:2. 

11. A method of detecting expression of the protein of 
claim 9, comprising the steps of: 

(a) contacting mRNA obtained from a cell with a labeled 
hybridization probe; and 

(b) detecting hybridization of the probe with the mRNA. 
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ABSTRACT L-Histidine, 90% enriched at the C2 posi- 
tion, was incorporated into the catalytic triad of a^lytic pro- 
tease <EC 3.4.21.12) with the aid of a histidine-requiring mutant 
i^Lysobacter enxtftnogenes (ATC 29487), and the pH dependence 
of the coupling constant between this carbon atom and its directly 
bonded proton was reinvestigated. The high degree of specific '"a^ 
isotopic enrichment attainable with the auxotroph permits direct 
observation and measurement of this coupling constant in proton- 
coupled NMR spectra at 67.89 MHz and at 15. 1 MHz. In con- 
trast to the earlier study, the present results indicate that this cou- 
pling constant does respond to a microscopic ionization with pK, 
near 7.0; moreover, the magnitude of the values of Vc-ii observed 
are in accord with those expected for titration of the histidyl res- 
idue. We conclude that the original measurement must be in error 
and that this coupling constant now also supports a histidyl residue 
. that titrates more or less normally as a component of the catalytic 
triad of serine proteases. 

A "catalytic triad" comprised of the side-chain functional groups 
of aspartic acid, histidine, and serine has thus far proved to be 
an invariant feature of the active sites of serine proteinases as 
demonstrated by x-ray diffraction, studies (1-6). The ubiquity 
and diversity of individual enzymes belonging to this class sug- 
gests that this array of Asp-His-Ser residues possesses special 
catalytic properties! The precise mode of operation of this triad 
in serine protease-catalyzed hydrolysis of amides and esters is. 
therefore, of considerable interest. 

A prerequisite to the understanding of the effectiveness of 
this triad is a knowledge of the ionization behavior of its com- 
ponent functional groups, and this has been a controversial is- 
sue. A histidyl residue is essential for activity (7-10), and be- 
cause the activities of serine proteinases increase with pH in a 
manner indicative of the titration of a single group having a pK, 
«7.0 (11), this ionization was originally assumed to represent 
that of the particular histidyl residue. However, Hunkapiller 
et al (12) proposed that this pK, of 7.0 should instead be as- 
signed to the aspartic acid residue and that the histidyl residue 
should be assigned a pK, of less than 4.0. The experimental basis 
for this proposal was a determination that the coupling constant 
between C2 of the histidyl residue in the catalytic triad of a-lytic 
protease and its directly bonded proton was independent of pH 
over the range 4.0-8.0 and indicative of a neutral imidazole 
ring. The result of this effective reversal of normal pK^ assign- 
ments is. to make the aspartic acid carboxylate the ultimate 
charge donor in the operation of the so-called "charge-relay" 
mechanism (1, 12) of attack on the peptide bond. 

The hypothesis that histidyl residues in the catalytic triads 
of serine proteases are abnormally weak bases, whereas the cor- 
responding aspartic acid residues are abnormally weak acids, 
has received considerable support, both experimental (13-18) 
and theoretical (19-23). There are^ however, other experimen- 



tal results (24-28) that indicate more normal ionization behav- 
ior; at one time, substantial controversy on this point existed. 
Recent (29) and ^H NMR (30-32) studies strongly indicate 
that histidyl residues at the catalytic site titrate more or less 
normally. Nevertheless, the experimental data originally sup- 
porting the pK,-reversal hypodiesis remain to be reconciled 
with these studies. Especially troublesome are the measure- 
ments of the histidyl V'^cz-h coupling constant for a-lytic pro- 
tease (12) because this result is difficult to attribute to anything 
but a histidyl residue with an abnormally low pK,. 

Hie reported measurements of V»c:2-h 
culties. A major problem is that the difference in magnitude of 
this coupling constant between the protonated (^2X8 Hz) and 
neutral («»208 Hz) forms of the imidazole ring is small, and its 
measurement in a-lytic protease was hampered, by large line- 
widths and by background natural-abundance resonances that 
obscured one hne of the doublet. Therefore, determination of 
the coupling required measurement of 1/2 / or the taking of 
difference spectra. Indeed, whether this measurement could 
be made with sufficient precision under these circumstances 
has been questioned (26, 33). 

Improved NMR instrumentation operating at higher mag- 
netic field offers the possibility of enhancing the accuracy of the 
measurements because, at higher fields, interference from 
background natural-abundance signals should be substantially 
reduced. Also, a histidine-requiring mutant of Lysobacter en- 
zymogenes is now available which allows one to achieve a higher 
specific enrichment and, thus, to obtain improved signal 
detection and resolution. In view of these improved prospects 
for measuring this coupling constant and the difficulties asso- 
ciated with the earlier study, we report here a reexamination 
of its pH dependence in a-lytic protease. 

MATERIALS AND METHODS 

L-Histidine, selectively enriched with "C at C2 was obtained 
fewm Isotope Labelling (Whipp, NJ). or KOR Isotopes, (Cam- 
bridge, MA), and vras synthesized from L-2,5-diamino-4-keto- 
valertic acid and KS^^CN as described by Ashley and Harring- 
ton (34) and Heath et aL (35). Each preparation was judged to 
be roughly equivalent in regard to purity and specific * C en- 
richment («=92%) by NMR spectroscopy. Ac-L-Ala-L-Pro- 
L-Ala-p-nitroanilide was synthesized as described by Hunka- 
piller et aL (36) and used to assay the activity of the enzyme. 

The *^C-labeled histidyl-a-lytio-protease was prepared and 
purified by culturing a histidine-requiring mutant of L. enzym- 
ogenes using the previously described procedures {12, 29), The 



* Presented in part at the Ninth Intemctional Conference on Magnetic 
Resonance in Biological Systems , Bender, France, September 1-6, 
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FiO. 1. Proton-Klecoupled 67.89-MHz NMR spectra of a-lytic protease. (A) [2-'^:iHistidyl-enriched a-lytic protease (-"3 mM at pH 4.7; 6400 
scans with a recycle time of 0.84 sec). (B) Natural-abundance o-lytic protease (*«8 mM at pH 6.0; 46,000 with a reticle time of 2 sec). 



peptidase activity of a-lytic protease was assayed against Ac-L- 
Ala-L-Pro-L-Ala-p-nitroanilide (4 x lO'^M in 0.05 M Tris buf- 
fer, pH 8.75, at 25*^0), Based on A^^s = 8.9. purified prepa- 
rations of a-lytic protease used in these NMR studies exhibited 
K^t/K^ values of 2.0 x 10^ s'* as compared to a value of 



1.5 X itf* M"*s"^ reported previously (36). 

^^C NMR spectra were recorded at 67.89 MHz on a Bruker 
HX-270 spectrometer and at 15.08 MHz on a Bruker WP-60 
spectrometer; 10-mm probes were used with both instruments. 
The NMR samples were 1-5 mM in a-Iytic protease and were 
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Fio. 2. Proton-coupled 67.89-MH3E ^^C NMR spectra of [2-^'^lhistidyl-enriched a-lytic protease. (A) Enzyme (1.6 mM) at pH 5.54 (25,300 scans 
with a recycle time of 0.84 sec). (B) Enzyme (1.3 mM) at pH 8.24 (38,500 scans with a recycle time of 0.84 sec). 
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FiO. 3. Comparison of representative hi^ and low pH doublets from 67.89-MHz proton-coupled spectra of [2-^%]histidyl-«nriched a-lytic 
protease, , Enzyme (1.34 mM) at pH 8.24 (38,650 scans); — , 1.5 mM enzyme at pH 5.25 (51,960 scans). 



prepared by dissolving lyophilized powders of enzyme in 0.1 
M KCl. About 15% of ^H20 was added to provide an internal 
Beld frequency lock signal. The relatively sharp signal in ^^C 
NMR specta of a-lytic protease arising firom the guanidinium 
carbons of the 12 arginine residues (and previously assigned a 
chemical shift of 157.25 ppm relative to tetramethylsilane) was 
used as an internal reference after its position relative to internal 
dioxane was verified to be the same at high and low pH. Chem- 
ical shifts are reported in ppm from tetramethylsilane. 

In general, 67. 89- MHz C spectra were acquired by using 
a 90° radiofrequency pulse (26 a spectral width of 16,000 
Hz, and 8000 data points. The ^^C spectra at 15.08 MHz were 
acquired with a 90" pulse (21 /is), a spectral width of 4000 Hz, 
and 2000 data points. 

The pH of the solution and the specific activity of the enzyme 
were checked both before and after recording each spectrum; 
only for those samples which exhibited no discernible change 
in these parameters are spectra reported here. The pH of the 
sample was varied by the addition of 0.25-0.5 M NaOH or HCl. 



RESULTS AND DISCUSSION 

Representative proton-decoupled 67.89-MHz ^^C NMR spectra 
of unlabeled a-lytic protease and of [2-*^]histidyl-Iabeled a- 
lytic protease are compared in Fig. 1. The large single resonance 
at 135 ppm present only in the spectrum of the isotopically en- 
riched enzyme is clearly that of the ^^C-labeled carbon of the 
histidyl residue. Hie pH dependence of the chemical shift of 
this resonance is the same as reported earlier (12). Represent- 
ative proton-coupled '^^C NMR spectra at hi^ and low pH are 
shown iii Fig. 2; now both, lines of the doublet are olearly re- 
solved at high and low pH, so that ^Jc-h can be measured di- 
rectly from the peak 'separation. Six independent determina- 
tions of 7c-H were made at pH values of 4.66, 5:25, 5.35, 5.47, 
5.54, and 6.02, which gave-values for '/c-h of 219, 217. 219, 217, 
217. and 216 Hz, respectively. Two determinations of ' 7c-h 
pH 8.24 and 8.44 gave values of 208 and 204. respectively. 
Either Lorentzian or parabolic interpolation of the peak posi- 
tions yielded the same value for Vc-h- curves in Fig. 3 for 
representative high and low pH doublets demonstrate that 
Vc-H does change with pH. 

In addiUon to the high-field ^^C NMR measurements at 67.89 
MHz, the coupling constant was also determined by ^^C NMR 
spectroscopy at 15.1 MHz, and even at this lower magnetic 
field, both lines of the doublet were sufficiently resolved to 



allow direct measurement of the coupUng. Two independent 
determinations of the coupling constant in both the high and 
low pH ranges gave effectively the same results as the mea- 
surements at 67.89 MHz. 

Hie present results indicate that this coupling constant does 
respond to an ionization of the histidyl residue with a pK. near 
7.0, and the original measurements (12) must be in error. Hie 
source of this error is, at present, not dear, but possibly derives 
from the presence of multiple forms of the enzyme (31) at acidic 
pH, These forms can be resolved at 125 MHz where they are 
in slow exchange (R. J. Kaiser , and T. C. Perkins, personal 
communication). 

Consequendy, the NMR data (^'N, "C. and ^H) now support 
a histidyl residue which titrates more or less normally as a com- 
ponent of the active-site catalytic triads of serine proteases — at 
least for the free enzyme in solution. Other experimental or 
theoretical studies that support, as well as mechanistic schemes 
based upon, the pK,-reversal hypothesis need reappraisal. 

Hiis work was supported by grants from the National Institutes of 
Health (CM-27927 and CM 164221) and from Research Corporation. 
The high-field NMR experiments were performed at the NMR Facility 
for Biomolecular Research located at the F. Bitter National Magnet 
Laboratory (Massachusetts Institute of Technology). The NMR Facility 
is supported by Grant RR00995 from the Division of Research Re- 
sources of the National Institutes of Health and by National Science 
Foundation Contract C-670. 
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Conjugates witli Dyes, Haptens, and Cross-Linking 
Reagents 

Michael Brinkley 

Molecular Probes, Inc., 4849 Pitchford Avenue, Eugene, OR 97402 
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I. INTRODUCTION 

ModiHcation of proteins, DNA, and other biopolymers 
by labeling them with reporter molecules has become a 
very powerful research tool in immunology, histochem- 
istry, and cell biology. A number of excellent reviews of 
this subject have been published ( . In addition, there 
are a growing number of commercial applications of these 
modified biomolecules, including clinicEd immunoassays, 
DNA hybridization tests, and gene fusion detection 
systems. In these techniques, a small molecule with special 
properties, such as fluorescence or binding speciHcity, is 
covalently bound to a protein, a DNA strand, or other 
biomolecule; Specific examples include fluorescent- 
labeled antibodies for detection and localization of cell- 
surface antigens, biotin-labeled single-stranded DNA 
probes for detection of DNA hybridization, and hapten- 
labeled proteins that, when introduced into a suitable host 
animal, generate liapten-specific antibodies. 

This review will focus on the experimental design and 
procedures for preparing protein conjugates with dyes, 
biotin, and haptens such as drugs and hormones. Methods 
for covalently linking two unlike biopolymers through the 
judicious choice of cross-linking reagents will also be 
discussed The following specific topics will be addressed: 
(a) reactive groups of proteins that are available for 
modification, including their naturally occurring amino 
acids, and reactive groups introduced by chemical mod- 
ification, (b) reagents that can be us6d to couple molecules 
to these reactive sites, (c) experimental procedures for 
preparing conjugates, (d) purification and isolation of 
conjugates, and (e) techniques for determining the degree 
of labeling. 

II. GENERAL DISCUSSION OF METHODS 

A. Reactive Groups of Proteins. Proteins and pep- 
tides are amino acid polymers containing a number of 
reactive side chains. In addition to, or as an alternative 
to, these intrinsic reactive groups, specific reactive moieties 
can be introduced into the polymer chain by chemical 



modification. These groups, whether or not they are 
naturally a part of the protein or are artificially introduced, 
serve as "handles" for attaching a wide variety of moleciiles, 
including other proteins. The intrinsic reactive groups of 
proteins are described in the following section. 

(1) Amines {Lysines, a- Amino Groups) . One of the most 
common reactive groups of proteins is the aliphatic €-amine 
of the amino acid lysine. Lysines are usually present to 
some extent and are often quite abundant. For example, 
the protein bovine insulin contains only a single lysine 
amine, while avidin, a protein found in egg whites, contains 
36 lysines (7), Lysine amines are reasonably good nu- 
cleophiles above pH 8.0 (pKa = 9.18) (5) and therefore 
react easily and cleanly with a variety of reagents to form 
stable bonds <eq 1). Other reactive amines that are found 

Piot*4rvf4Hy + RX > Ph»t»in-NHR + XM (1) 

in proteins are the a-amino groups of the N-terminal amino 
acids. The a-amino groups are less basic than lysines and 
are reactive at around pH 7.0. Sometimes they can be 
selectively modified in the presence of lysines. There is 
usually at least one a-amino acid in a protein, and in the 
case of proteins that have multiple peptide chains or several 
subunits, there can be more (one for each peptide chain 
or subunit). Bovine insulin has one N-terminal glycine 
residue and one N-terminal phenylalanine (9). There are 
proteins that do not possess free a-amino groups, such as 
cytochrome C and ovalbumin. In these molecules, the 
N-terminal amino group is N-acylated, and therefore, not 
reactive toweird the usual modification reagents. Since 
either N-terminal amines or lysines are almost always 
present in any given protein or peptide, and since they are 
easily reacted, the most commonly used method of protein 
modification is through these aliphatic amine groups. 

(2) Thiols {Cystine, Cysteine, Methionine), Another 
common reactive group in proteins is the thiol residue 
from the sulfur-containing amino acid cystine and its 
reduction product cysteine (or half-cystine), which are 
counted together as one of the 20 amino acids. Cysteine 
contains a free thiol group, which is more nucleophilic 
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than amines and is generally the most reactive functional 
group in a protein. It reacts with some of the same 
modification reagents as do the amines discussed in the 
previous section and in addition can react with reagents 
that are not very reactive toward amines. Thiols, unlike 
most amines, are reactive at neutral pH, and therefore 
they can be coupled to other molecules selectively in the 
presence of amines (eq 2). This selectivity makes the thiol 



> NH,-rVot*Jn>SA 4- XH 



(2) 



group the linker of choice for coupling two proteins 
together, since methods which only couple amines (e,g,, 
glutaraldehyde, dimethyl adipimidate coupling) can result 
in formation of homodimers, oligomers, and other un- 
wanted products (10), Since free sulfhydryl groups are 
relatively reactive, proteins with these groups often exist 
in their oxidized form as disulfide-linked oligomers or have 
internally bridged disulfide groups. Immunoglobulin M 
is an example of a distilfide-linked pentamer, while im- 
munoglobulin G is an example of a protein with internal 
disulfide bridges bonding the subunits together. In 
proteins such as this, reduction of the disulfide bonds with 
a reagent such as dithiothreitol (DTT) is required to 
generate the reactive free thiol (ii). In addition to cys- 
tine and cysteine, some proteins also have the amino acid 
methionine, which contains sulfur in a thioether linkage. 
When cysteine is absent, methionine can sometimes react 
with thiol-reactive reagents such as iodoacetamides (12). 
However, selective modification of methionine is difficult 
to achieve and therefore is seldom used as a method of 
attaching small molecules to proteins. 

(5) Phenols {Tyrosine). The phenolic substituent of 
the amino acid tyrosine can react in two ways. The 
phenolic hydroxyl group can form esters and ether bonds, 
and the aromatic ring can undergo nitration or coupling 
reactions with reagents such as diazonium salts at the 
position adjacent to the hydroxyl group. There is con- 
siderable literature describing the reaction of tyrosyl 
residues with diazonium compoimds (J 5). For example, 
ap-aminobenzoyl biocytin derivative has been diazotized 
and reacted with protein tyrosine groups {14). Modifi- 
cation of tyrosines has primarily been used in structural 
studies, rather than as a means for attaching specific labels, 
since acetylation and nitration can give useful information 
concerning the participation of tyrosine in the binding 
properties of proteins. Often, the reactivity of tyrosines 
with amine-selective modification reagents to form un- 
stable carboxylic acid esters or sulfate esters is an unwanted 
side reaction resulting in conjugates that slowly hydrolyze 
during storage. Methods for preventing this problem are 
discussed in a later part of this teaching editorial (section 
V.B,1). 

{4} Carboxylic Acids {Aspartic Acid, Glutamic Acid), 
Proteins contain carboxylic acid groups at the carboxy- 
terminal position and within the side chains of the di- 
carboxylic amino acids aspartic acid and glutamic acid. 
The low reactivity of carboxylic acids in water usually 
makes it difficult to use these groups to selectively modify 
proteins and other biopolymers. In the cases where this 
is done, the carboxylic acid group is usuaUy converted to 
a reactive ester by use of a water-soluble carbodiimide 



o 
II 

Prot*in-COH 



> rVotoin-COX RNHNH, 



O 
II 

PtaUkv CNHNHR (3) 
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and then reacted with a nucleophilic reagent such as an 
amme or a hydrazidd {15, 16). The amine reagent should 
be weakly basic in 6rder to react specifically with the 
activated carboxylic: acid in the presence of the other 
amines on the protein. This is because protein cross- 
linking can occur when the pH is raised to above 8.0, the 
range where the protein amines are partially unproto- 
nated and reactive. ^For this reason, hydrazides, which 
are weakly basic, arejiiseful in coupling reactions with a 
carboxylic acid {17)\ This'-reaction can also be used 
effectively to modify the carboxy terminal group of small 
peptides. \ 

' {5) Other Amino Acid Side Chains {Arginine, Hiati- 
dine^Tryptophan). (Chemical modification of other amino 
acid side chains in ^proteins has not been extensive, 
compared to the groups disoissed above. The high pK^ 
of the guanidine functional group of arglnine (pKa = 12- 
13) necessitates more drastic reaction conditions than most 
proteins can survive* Arginine modification has been 
accomplished primarUy with glyoxals and a-diketone 
reagents {18), Trypti)phan modification requires harsh 
conditions and is seldom carried out except as a method 
of analysis in structural or activity studies, Histidines 
have been subjected to photooxidation {19) and reaction 
with iodoacetates {20). 

B, Protein Modincation Reagents. This section will 
survey the extensive selection of reagents that are available 
for thepurpose of profein modification. The f imdamental 
principles for imdersianding how to use these reagents 
are (1) recognition of |;he reactive group(s) on the protein 
or peptide that can be modified and (2) knowledge of the 
type of chemical reactions these reactive groups will 
participate in and thej nature of the chemical bonds that 
will result from these; reactions. 

(J) Amine-Reactive Reagents. These reagents are those 
which will react primtu-ily with lysines and the a-amino 
groups of proteins and peptides under both aqueous and 
nonaqueous conditions. S ome amine-reacti ve reagents are 
more reactive, and therefore less selective, than others, 
and it wiU be necessary to understand this property in 
order to choose the best reagent for modification of a 
specific protein. Thej following amine-reactive reagents 
are available. i 

(o) Reactive Ester^ {Formation of an Amide Bond), 
Reactive esters, especially TV-hydroxysuccinimide (NHS) 
esters, are among the I most commonly used reagents for 
modification of amine groups {21), These reagents have 
intermediate reactivitly toward amines, with high selec- 
tivity toward aliphatic amines. Their reaction rate with 
aromatic amines, alcohols, phenols (tyrosine), and histi- 
dine is relatively low. Reaction of NHS esters with amines 
under nonaqueous conditions is facile, bo they are useful 
for derivatization of ^mall peptides and other low mo- 
lecular weight biomolepules. The op tiiaum pH for reaction 
in aqueous systems as 8.0-9.0. The aliphatic amide 
products which are formed are very stable (eq 4). The 



Pmtcin-fJH, + RC-O-N I 



vO 
II 



> Proteln-NHCR + HO-N 




(4) 



NHS esters are slowly hydrolyzed by water (22), but are 
stable to storage if kept well desiccated. Virtually any 
molecule that contains a carboxylic acid or' that can be 
chemically modified to contain a carboxylic acid can be 
converted into its NHS ester (eq 5). making these reagents 
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R-COH 




(5) 



among the most powerful protein-modiiication reagents 
available. Newly developed NHS esters are available with 
sulfonate groups that have improved water solubility {23), 
A short list of reactive NHS ester derivatives of fluorescent 
probes, biotin, and other molecules is given in Table I. 

(6) Isothiocyanates {Formation of a Thiourea Bond), 
Isotliiocyanates, like NHS esters, are amine-modiflcation 
reagents of intermediate reactivity and form thiourea 
bonds with proteins and peptides (eq 6)* They are 



F^t*in-NH, + RN = C=>S 



s 

tl 

> fVotein-NHC-NHR 



(6) 



somewhat more stable in water than the NHS esters and 
react with protein amines in aqueous solution optimally 
at pH 9.0-9.5. Since this is a higher pH than the optimal 
pH for NHS esters (which undergo competing hydrolysis 
at pH 9.0-^,6), isothiocyanates may not be as suitable as 
NHS esters when modi^dng proteins that are sensitive to 
alkaline pH conditions. One of the most commonly used 
fluorescent derivatization reagents for proteins is fluo- 
rescein isothiocyanate (FITC). A number of other fluo- 
rescent dyes (coumarins and rhodainines) have been 
coupled to proteins via their reactive isotKiocyanates (24), 
(c) Aldehydes (Formation of Imine, Reduction to Alkyl- 
amineBond). Aldehyde groups react \md^r mild aqu^ua 
conditions with aliphatic and aromatic amines to form an 
intermediate known as a Schiff base (an imine), which can 
be selectively reduced by the mild reducing agent sodium 
cyanoborohydride to give a stable alkylamine'fiond (eq 7) 
{44, 63), This method of amine modification is not used 



Table I. Succinixnidyl Ester Probes 



probes 



structure 



function 



ref 



succinimidyl f1uore8cein-5-(and -6-)carboxylate 



succiniinidyl N^^'^'-tetramethybrhodamine-5- 
(and -6-)carboxylate 



succinimidyl 7-amino-4-methyIcouinariji-3-acetate 




fluorescent label 



76. 76 



fluorescent label 



76 



CH. 



fluorescent label 



77 



succiniinidyl X-rhodamine-5-(and -6-)cQrboxylate 




fluorescent label 



75,78 



succinimidyl i>>biotin 



H H 



r 

NH 



0 = C 
I 

0 



ligand» afflnity label 



79 



succinimidyl 3-(4-hydroxyphenyl) propionate 



HO 



0 
11 



CHjCHjC-C-l 



radioiodination label 



80 
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FVot«lrvf<H, RCH-O 



N«BH«CN 
> Ptotoin-N-CHR > 



Protaki-fJHCH^ (7) 



in protein conjugations as frequently as the activated ester 
method, but when the molecule to be attached has an 
aldehyde group, or can be easily converted to an alde- 
hyde, the method is mild, simple, and very effective. Al- 
dehydes (glyozals) can also react with protein axginine 
groups (25, 26) and the nucleic acid base guanosine, making 
them of some use in nucleic acid modiAcation (27). 

id) Sulfonyl Halides {Formation of a Sulfonamide 
Bond), Sulfonyl halides are highly reactive amine- 
modifying reagents. They are unstable in water, especially 
at the pH required for reaction with aliphatic amines, but 
they form extremely stable sulfonamide bonds which can 
survive even amino acid hydrolysis (eq d). It is for this 



Proteln-NH, + 



R-s-a 

II 

0 



o 

It 

Protoin-NH-S-fl 

il 
o 



HCl 



(8) 



reason that sulfonamide conjugates are useful for amine- 
terminus derivatization (Dansyl-Edman degradation) and 
as tracers (28), In addition to amines, sulfonyl halides 
also react with phenols (tyrosine), thiols (cysteine), and 
imidazoles (histidine) on proteins (29); therefore, they are 
less selective theui either NHS esters or isothiocyanates. 
The conjugates formed with thiols, imidazoles, and phe- 
nols are all unstable and, if not removed during puriH- 
cation, can lead to loss of the label from the protein during 
long-term storage (see section V.B.I). One of the most 
widely used long-wavelength fluorescent probes, Texas 
Red, is a sulfonyl chloride. It has the longest wavelength 
spectral properties of any of the common amine-reactive 
fluorescent labeling reagents {30), 

(e) Miscellaneous Amine Reactive Reagents {Dichlo- 
rotriazines, Alkyl Halides^ Anhydrides), The dichloro- 
triazine derivative of fluorescein, known as DTAF (I), has 




high reactivity with protein amines and has been used to 
prepare fluorescein tubulin with minimal loss of activity 
{31), In addition to amines, dichlorotriazines will react 
with alcohols at elevated temperatures (60-90 **C) and are 
used to prepare polysaccharide cox^ugates {32), Some alkyl 
halides, including iodoacetamides commonly used to 
modify thiols, will react with amines of proteins if the pH 
is in the range 9.0-9.5 (53). Other reagents that have been 
used to modify amines of proteins are acid anhydrides. 
Succinic anhydride is commonly used to succinylate amine 
groups of basic proteins for the purpose of changing their 
isoelectric point and other charge-related properties {34), 
Mixed anhydrides derived from reaction of a carboxylic 



acid with carbitol or 2-methylpropanol chlorof ormates (eq 
9) are excellent reagents for modification of amines under 



o o 



o o 
II II 

R-COH + CICOCH,CH(CHJ, > RCOCOCH,CM(CH J, 



0 0 
tl II 



ProtvirvNH, + RCOCOCH,CH(CH^, 



O O 

il II 
Pioteln-NHCR + HOCOCH,CH(CH,)« (9) 

mild conditions {35). Of these, the carbitol mixed anhy- 
dride is relatively water soluble afld is the preferred reagent 
for modification of amines in aqueous solution. 

(2) Thiol-Reactive Reagents, Thiol-reactive reagents 
are ihose that will couple to thiol groups on proteins to 
give thioether-coupled products.^ These reagents react 
rapidly at neutral (physiological) pH and therefore can be 
reacted with thiols selectively in the presence of amine 
groups. 

(o) Haloacetyl Derivatives {Formation of a Thioether 
Bond), These reagents (usually iodoacetamides) are 
among the most &equently used reagents for thiol mod- 
iflcation. In most proteins, t^e site of reaction is at cys- 
teine groups that are either intrinsically present or that 
result from reduction of cystines. The reaction of iodoac- 
etate with cysteine is approximately twice as fast as that 
with brompacetate and 20-100 times as rapid as that with 
chloroacetate {36), As mentioned previously, in the 
absence of cysteines, methionines can sometimes react 
with haloacetamides {12), Reaction of haloacetamides 
with thiols occurs rapidly at neutral pH at room temper- 
ature Or below, and under these conditions, most aliphatic 
amines are unreactive. In addition to proteins, haloac- 
etamides have been reacted with thiolated peptides and ' 
thiolated primers for DNA sequencing (37), and also with 
RNA (on tihiouridine) {38) . The thioether linkages formed 
from reaction of haloacetamides Eire very stable. A 
potential problem in using iodoacetamides as modification 
reagents is their instability to light, especially in solution; 
therefore, they must be protected from light in storage 
and during reaction. The fluorescein and rhodamine io- 
doacetamides are among the most intensely fluorescent 
sulfhydryl reagents available for protein and peptide 
modiflcation. 

(6) Mdleimides {Formation of a Thioether Bond) , Ma- 
leimides (eq 10) are similar to iodoacetamides in their 



NH,-Prot»*o-SH + R-N 



> NH.PtotokvS 



1 



R (10) 



application as reagents for thiol modification; however, 
they are more selective than iodoacetamides, since they 
do not react with histidine, methionine, or thionucleotides 
{39, 40), The optimum pH for the reaction of maleimides 
is near 7.0. Above pH 8.0, hydrolysis of maleimides to 
nonreactive maleamic acids can occur {41), 

(c) Miscellaneous Thiol-Reactive Reagents, These 
reagents include bromomethyl derivatives and pyridyl di- 
sulfides. The bromomethyl derivatives are similar in 
reactivity to iodoacetamides. The haloalkyl derivatives 
monobromobimane andmonochlorobimane (II) react with 
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X n ci, Br 



II 



glutathione and other thiols in cells to give fluorescent 
adducts, thus providing a method of quantitation of thi- 
ols {42). Pyridyl disulfides react in an exchange reaction 
with protein thiols to give mixed disulfides (eq 11) {43). 



Prot«in-SH •¥ RS-S 




o 



ill) 



(3) Carboxylic Acid- and Aldehyde-Reactive Reagents, 
(o) Amines andHydrazides {Formation of Amide orAtkyl- 
amine Bonds). Amines and hydrazides can be coupled 
to carboxylic acids of proteins via activation of the car- 
boxyl group by a water-soluble carbodiimide followed by 
reaction with the amine or hydrazide. As mentioned 
previously (section II. A.4), the amine or hydrazide reagent 
must be weakly basic so that it .will react selectively with 
the carbodiimide-activated protein in the presence of the 
more highly basic protein c-amines (lysines). The reaction 
of these probes with carbodiimide-activated carboxyl 
groups leads to the formation of stable amide bonds (eq 
12). 



o 
II 

Ptotoin-COH + RNaC-NR' 



II R'NH, 

> ProtolivCOCoN-R > 

HNR' 

O O 
Prot«lr>-CNHH" + RNHCNHR* (12) 



Amines and hydrazides are also able to react with al- 
dehyde groups, which can be generated on proteins by 
periodate oxidation of carbohydrate residues on the 
protein. In this case, a Schiff base intermediate is formed 
(eq 13), which can be reduced to an aUsylamine with sodium 



fVot«In-glY -f NalO^ 



> FVotoiivCH 



1} RNH, 
2) N«BH«CN 
,0 > Protaln-CM/tHR 

(13) 



cyanoborohydride, a mild and selective water-soluble 
reducing agent (44) (see also section II.B.l.c). Since the 
Schiff base formation is reversible, it is possible to minimize 
formation of protein-protein products by adding a large 
excess of amine or hydrazide reagent. 

(4) Bifunctional Reagents, Bifunctional, or cross- 
linking, reagents are specialized reagents having reactive 
grpupsthat will form a bond between two different groups, 
either on the same molecule or two different molecules. 
Bifunctional reagents can be divided into two types: those 
with the same reactive group at each end of the molecule 
(homobifunctional) and those with different reactive 
groups at each end of the molecule (heterobifunctional). 
Recent trends are heavily in favor of the use of hetero^ 
bifunctional cross-linkers where the bifunctional reagent 
has two reactive sites, each with selectivity toward different 
functional groups (amine reactive and thiol reactive, for 
example). These reagents, some of which are available in 
a range of chain lengths, are well-suited to the task of 
controlled coupling of unlike biomolecules, such as two 
different proteins. Table II lists some frequently used 
heterobifunctional cross-linkers along with their reactiv- 
ities and references describing their use. 



(a) Amine Reactive — Thiol or Protected Thiol. Because 
thiols will react selectively in the presence of amines with 
a variety of reagents, these functional groups are very useful 
for attaching two different proteins together. Thiol- 
coupling methods are frequently employed to prepare 
protein-enzyme conjugates. If the proteins to be coupled 
do not contain intrinsic thiols, the procedure is typically 
carried out by introducing a single thiol group to an amine 
of one of the proteins by means of a heterobifunctional 
reagent (eq 14). Traut*s reagent (iminothiolane) has been 



Pratein(1}-NH, -f 



>*-0-CCH,CH,SCCH, 



0 0 o , 

II II ih 

Prot6ln(U-NHCCH,CH^CCH, >Protolnn) NHCCH.CH^H 



0 
II 



Prauinn)-NHCCH,CH,SH + FVotelnUhNHCCH,! 



O O 

II II 
Protolnd) NHCCH.CH,SCHtCNH-Protetn(2) 



(14) 



extensively used for the purpose of introducing thiol groups 
selectively to proteins (45, 46). Many other bifunctional 
reagents contain both an amine-reactive and a protected 
thiol group, such as succinimidyl (acetylthio)acetate 
(SATA) (47, 48) or succinimidyl 3-(2-pyridyldithio)pro- 
pionate (SPDP) (43, 49). After deprotection, the thiol- 
containing protein is then reacted with a thiol-ieactive 
group on the other protein, which has been introduced by 
a similar technique. Alternatively, proteins with synthetic 
thiol groups that have been introduced by modiHcation 
can be used to couple to a number of thiol-reactive 
derivatives of dyes, biotin^ haptens, or other molecules. 

(6) Amine Reactive — lodoacetamide^ lodoacetamides 
are primarily, thiol-reactive groups with ttie reaction 
occurring rapidly at physiological pH, but they can react 
with amines under more alkaline conditions (greater than 
pH 9.0) and long reaction times (section II.B.2.a). lo- 
doacetamides can be introduced into a protein or peptide 
that does not have intrinsic thiols via amine-reactive 
derivatives (eq 15) (50). The resulting rnodified protein 



// 0 0 

Protoin-NH, + ><-0-C(CH,),NHCCH,r 



0 0 

II II 

PrptelrvNHCICH,! ^NHCCH,! 



(15) 



• * 

can then be coupled to any th^ol-containing molecule. The 
second molecule is usually a thiol-containing protein. 

(c) Amine Reactive — Maleimide, The introduction of 
maleimides into a protein or peptide can be carried out 
with heterobifunctional reagents that have an amine- 
reactive group at one end and the thiol-specific maleim- 
ide at the other end (eq 16). The applications are very 



Prot«k>-NH, + 
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Table II. HeterobifoncUonal Cross-Llnkiiig Reagents 

reagent 
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structure 



reactivity 



ref 



sucdnimidyl 3-(2-pyridyldithio)propionate (SPDP) 



cx 8 v, 

S-SCHjCHjCO-K 



primary amine, thiol 



49 



sucdnimidyl troru-4-(iV-ma]eimidylme1iiyl)cyclohexane- 
1-carboxylate (3MCC) 



0-C— ( )— CH,-N 



primary amine, thiol 



54,48 



sucdnimidyl (acetyithio)acetate (SATA) 



o O V-i 
II II / 

H-jCCSCH^-N 



primary amine, thiol 



47,48 



4-[(succinimid3^ozy)carboxyl]-a-methyl-a- 
(2-pyridyldithio)toluene (SMPT) 
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similar to those for the iodoacetamides discussed in the 
preceding section. SpeciHc applications include coupling 
of ricin to monoclonal antibodies (5i) and linking of oli- 
gonucleotides to enzymes (52). 

(<0 Amine Reactive — Aldehyde. Aldehydes do not 
occur naturally in proteins, but can be introduced in two 
ways. In the first method, carbohydrate groups on proteins 
are treated with an oxidizing reagent, such as sodium pe- 
riodate, or are converted via a galactose oxidase/catalase 
enzyme method, both of which split the sugar to form 
aldehyde groups (53). Not all proteins contain carbohy- 
drate groups, and therefore a second method of introducing 
aldehydes via the reagent glutaraldehyde has been em- 
ployed {10), Glutaraldehyde has been used extensively to 
couple two proteins together via their amine groups (eq 
17); however, like other homobifunctional reagents, glu- 

ProtelntU-NH, + Ptoteln(2hNH, + O = CHtCH,)aCH » O > 

Proteind »-NH(CH,»»NH- Protein! 2) (17) 

taraldehyde is being replaced with more selective heter- 
obifunctional reagents such as those discussed above. 

(5) Photoactivatable Reagents, Reagents are available 
that can be activated by light (photons) to produce a 
reactive intermediate that can couple to various functional 



groups on biomolecules. Two of the most frequently used 
photoactivatable reagents for this purpose are aromatic 
azides and benzophenones, 

(a) Aromatic Azides. Aromatic azides are efficiently 
photolyzed by illumination with an ultraviolet light at 
300-350 nm. The reactive molecule produced by^his pho- 
tolysis is a liitrene, which reacts rapidly and nonspecif- 
ically with either solvent molecules or with functional 
groups on biomolecules. Almost any function^ g^'oup or 
amino acid can be modified, since the nitrene is very 
reactive. Recent improvements in azide-based protein 
modification reagents have resulted in perfluorinated 
azides that generate nitrene intermediates with greater 
stability, thus giving reagents with higher efficiency (up 
to 40%) of reaction with the protein (57, 58). One of the 
primary uses of these highly reactive reagents is to carry 
out photoaffinity labeling experiments. In these exper- 
iments, the aromatic azide is attached to a drug or other 
molecule which binds specifically to a protein binding site 
(an example is an enzyme inhibitor or a nucleotide 
analogue) and then photolyzed. The location and type of 
bond formed in this process provides information about 
the environment near the binding site (59), In addition 
to their role as photoaffinity labels, aryl azides are useful 
as heterobifunctional cross-linkers. Succinlmidyl azido- 
benzoate (SAB)* p^azidophenacyl bromide, and 4-male- 
imidobenzophenone have been employed to couple pro- 
teins through dark reaction with amines Or thiols followed, 
by light activation (56, 58, 60, 61), 

(6) Benzophenones, Senzophenones are like azides in 
that they are photoactivatable by ultraviolet light, but 
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once they have been activated, they can either react with 
functional groups or return to the ground state. Thus, 
these molecules can sometimes be reactivated if they do 
not react on the first activation. These reagents are also 
used as photoaffinity labels in a manner similar to that of 
the aromatic azides {62), 

m. PRACTICAL CONSIDERATIONS 

Along with a thorough knowledge of protein reactivity 
and the available reagents for the desired type of protein 
modification, it is of crucial importance that the researcher 
understand tihe practical aspects of carrying out reactions 
between highly reactive small organic molecules and large, 
complex, conformationally sensitive, water^luble biopoly- 
mers, The following discussion will address some of the 
general rules, problems, and pitfalls of protein-modifica- 
tion chemistry. 

A, Choosing the Right Buffer. Conjugations should 
be carried out in a well-buffered system at a pH that is 
optimal for the reaction. The ionic strength should, in 
most cases, be in the range of 2&-'100 mM. For modification 
of thiol groups and a-amino groups, which occurs selec- 
tively at physiological pH (7.0-7.5). phosphate buffers are 
ideally suited. The more strongly basic lysine amines 
require more alkaline pH, in the range of 8.0-9.5, where 
phosphate solutions do not buffer well. For these reactions, 
carbonate/bicarbonate (pH of 100 mM bicarbonate is 9.2) 
or borate buffers are quite satisfactory. As an example, 
conjugations with NHS esters are best carried out in pH 
8.2 bicarbonate buffer, while isothiocyanates require the 
higher pH (9.0-9.5) provided by carbonate or borate 
buffers. The choice of buffer will in some cases be directed 
by compatibility of the protein. 

B, Cosolvents. If the reagent that is to be attached 
to the biomolecule is readily soluble at millimolar con- 
centrations in water or buffer, no cosolyent is needed, and 
the reagent can be added as a concentrated aqueous 
solution to the buffered reaction solution. Unfortunately, 
aqueous systems are very often incompatible with the 
reagent, as a result of poor solubility or high reactivity 
with water. In these cases, a water-miscible cosolvent must 
be employed that will dissolve the reagent without causing 
its decomposition. At the same time, the cosolvent must 
not cause irreversible denaturation or precipitation of the 
biomolecule. Some cosolvents that have been successfully 
utilized in protein modifications are methanol, ethanol, 
2-propanol, 2-methoxyethanol, dioxane, dimethylforma- 
mide (DMF), and dimethyl sulfoxide (DMSO). 

The most versatile of these cosolvents are DMF and 
DMSO. They are recommended because of the following 
desirable properties: (a) they are inert to many of the 
reactive reagents used in preparing conjugates, (b) they 
are miscible with water in all proportions, and (c) they are 
compatible with most aqueous protein solutions even at 
up to 30% v/v ratios. DMF is the solvent of choice for 
reactions of sulfonyl chlorides, since these reagents will 
react with DMSO. It is usually important that cosolvents 
be ciarefully dried and stored over a drying agent to prevent 
competing hydrolysis of the reactive modification reagent. 

C, Reaction Conditions. As a general rule, conjugation 
reactions should be done at below room temperature, since 
the rate of reaction of most conjugation reagents is rapid 
at low temperature. Low temperatures tend to increase 
the selectivity of the reaction, resulting in fewer side 
reactions and more consistent and reproducible results. A 
convenient procedure is to add the reagent to a gently 
stirred buffered solution of the protein in an ice-bath and 
then allow the bath to warm to room temperature over a 



period of about 2 h. Very reactive reagents such as sul- 
fonyl chlorides ^should be reacted under more carefully 
controlled conditions, such as 4 ®C for 1 h. Stirring can 
be done with a magnetic stir-bar and should not be 
excessively fast, since proteins can be denatured by violent 
mixing. Addition of the reagent should be carried out 
dropwise and as slowly as possible, since gradual addition 
increases the selectivity of the reaction. 

(1) Protein Concentration, Because the kinetics of 
conjugation of these reagents is bimolecular, but the hy- 
drolysis rates are pseudo-first-order, dilution results in 
competition between conjugation and loss of reagent by 
hydrolysis. Protein concentrations above 10 ^lA are 
strongly recommended, with an optimum in the range of 
50-100 fiM. 

(2) pH, In modification of amines, only the unproto- 
nated form is reactive, and therefore it is necessary to 
maintain a pH at which a signiEcant number of amines 
are unprotonated. An average plCa above 9 for lysines 
indicates that the higher the pH, the better. Offsetting 
this are the factors that the rate of reagent hydrolysis 
increases rapidly above pH 9 and that proteins tend to be 
unstable at a higher pH. A free amine terminus has a pKa 
near 7 and is sometimes preferentially modiHed when the 
reaction is rup at neutral pH. An effective compromise 
in most cases is to use a pH close to 9.0-9.2 if the protein 
is stable, but a lower pH combined with more reagent and 
longer reaction times if the protein is unstable. The suc- 
cinimidyl esters and DTAF appear to react more efficiently 
at a lower pH than the isothiocyanates and sulfonyl 
chlorides. Our experience with succinimidyl esters indi- 
cates that a reaction pH of Euround 8.2 gives excellent results 
for most proteins. 

(5) Reaction Time, Usually, 1-2 h is sufficient time for 
conjugation reactions to go to completion. Longer reaction 
times, if convenient, are acceptable, since the degree of' 
labeling is generally limited by the ratio of the reagent to 
protein, rather than the reaction time. Many published 
procedures specify overnight reaction times. Obviously, 
the more reactive the reagent, the shorter the reaction 
time; sulfonyl chloride reactions are faster than NHS ester 
reactions. 

IV, FACTORS INFLUENCING CHOICE OF MOLAR 
RATIO OF REACTANTS 

A. iEnd Use of Reagents. (1) Immunogen — High 
Degree of Labeling, Protein conjugates are frequently 
prepared for use in producing specific antibodies to a drug 
or other hapten in a host animal. The drug or hapten is 
conjugated to a high molecular weight protein carrier 
molecule and injected into the animal to elicit an inunune 
response, and over a period of time, specific antibodies to 
the drug or hapten are produced. For these purposes, a , 
high degree of labeling of the protein carrier is desirable, 
since more labels generally increase the strength and 
specificity of the immune response. 

(2) Labeled Antibody or Enzyme — Low to- Moderate 
Degree of Labeling, Antibodies and enzymes eire relatively 
sensitive to substitution, since there are usually reactive 
amino acid side chains (amines, thiols, histidines) in or 
near the binding sites. For this reason, a low to moderate 
degree of labeling is preferred in order to preserve binding 
specificity or enzyme activity. Excessive labeling can also 
result in decreased solubility of the conjugates, which also 
reduces the overall activity. In the case of many fluorescent 
labels, a high dye to protein ratio causes a dramatic 
decrease in the fluorescence efficiency of the conjugates 
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{63, 64). In our experience with antibodies, a substitution 
ratio in the range of 4-6 is usually optimal for good 
retention of binding activity. 

(5) Fluorescent Labeled Proteins/ Peptides — Low to 
Moderate Degree of Labeling, Fluorescent labels are often 
very sensitive to their molecular environment EUid therefore 
their fluorescence intensity is almost always decreased 
when they are bound to proteins and other biomolecules. 
Fluorescence also decreases when the fluorescent labels 
are located in close proximity to one another, probably as 
a result of transfer of excited-state energy (quenching) 
from one molecule to another {65). When proteins are 
labeled with fluorescent dyes, the fluorescence increases 
as more dyes are added; at the same time, however, the 
fluorescence efHciency decreases as a result of the quench- 
ing described above. Some dyes are more sensitive to 
quenching than others. FITC is about 50-70% quenched 
on IgG at a dye/protein ratio of 6 (66), while Cascade Blue, 
a newly developed blue fluorescent dye (67), retains nearly 
100% of its fluorescence efHciency under the same 
conditions. The number of dyes that can be conjugated 
to a protein without substantial loss of fluorescence will 
depend on the size of the protein and the distance between 
the functional groups to which the label is attached. 
Usually, more dyes can be attached to a large protein than 
a small protein or peptide. A general nile for conjugates 
of fluoresceiii is 4-6 dyes/protein and for rhodamines, 2-3 
dyes/protein. The degree of labeling depends on the 
relative reactivity of the labeling reagent to the protein 
and to water, the molecular weight and nxunber of reactive 
amines on the protein, the reactant concentrations (es- 
pecially of the protein), and other factors. The exact 
amount of label to use must be determined by experiment; 
however, as a guideline> 10 mol of a typical isothiocyanate 
or NHS ester is needed to label 1 mol of a protein. Because 
of the jfaster competitive hydrolysis rate, 20 mol of a sul- 
fonyl chloride, such as Texas Red, is required to label 1 
mol of a protein. 

B. Number of Reactive Groups on the Protein. 
Proteins vary greatly in the number of reactive amino 
acid groups. For example, some proteins have 40 or more 
reactive amine groups, while others may have only one or 
two amines or thiol groups. The reactivity of these groups 
with the labeling reagent and their effective concentration 
in solution will then have an effect on the amount of 
labeling reagent required to achieve the desired degree of 
substitution. This means that small molecular weight 
proteins or peptides with few reactive groups will require 
more labeling reagent p er gram than large molecular weight 
proteins with many reactive groups. 

C. Solubility of Modification Reagent in Reaction 
Solution, (i) Cosolvent Sometimes Required. The use 
of cosolvents was explained in section in.B. In some cases 
the labeling reagent is very hydrophobic and, even though 
it is readily soluble in DMF or DMSO, it precipitates when 
added to the buffered protein solution. It is often possible 
to circumvent this problem by adding some cosolvent 
gradually, with stirring, to the buffered protein solution 
until the protein solution contains 20-25 % cosolvent. The 
ionic strength of the buffer should be no more than 60 
mM so that the buffer does not salt out upon addition of 
the cosolvent. Then the solution of labeling reagent in 
cosolvent is added so that the final volume percent co- 
solvent in the reaction mixttire is around 30%. This 
modification often is successful in preventing precipitation 
of the labeling reagent. Many proteins are stable in 30% 
DMSO or DMF; however the stability of the protein to 
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these conditions should be determined before carrying 
out this technique. 

(2) Two-Stage Labeling as a Last ResoH. If the 
technique described in section IV.C.l is used and the 
labeling reagent still precipitates when added to the protein 
solution, it may be possible to purify the conjugate and 
then repeat the labeling procedure to increase the degree 
of substitution. 

D, Solubility of Conjugate. (J) Conjugate Is Often 
Less Soluble Than Native Protein, Problems with 
solubOity of the conjugate can occur, most often when the 
labeling reagent is hydrophobic or contains multiple ionic 
groups. These physical properties of the label can upset 
the natural folding of the protein and cause the conjugate 
to be significantiy less soluble than the native protein {30). 

(2) Overlabeling Can Cause Precipitation of Conjugate. 
Overlabeling can produce the same undesirable results 
noted above. The best solution ^ these problems is to 
use a lower ratio of labeling reagent to protein, resulting 
in a conjugate with a lower degree of substitution. 

V. PURIFICATION OF CONJUGATES 

A. Removal of Excess Noncovalently Bound La- 
beling Reagent. (1) Dialysis — Simple, Inexpensive 
Purification Method — Inefficient for Hydrophobic Mole- 
cules. Dialysis is the simplest, but most time-consuming, 
method of piirifying protein conjugates. Not all molecules 
dialyze efficiently; the rate of dialysis depends on their 
relative affinity for the protein versus the dialysis solution. 
Molecules that are sparingly soluble in water or strongly 
adsorbed to the protein surface will take a long time to 
dialyze. Dialysis works best when the- labeling reagent 
and its unreacted byproducts are hydrophilic. When 
purifying conjugates by dial3rsis, a dialysis buffer volume 
of at least 100 times the volume of the conjugate solution 
should be used and the dialysis buffer should be changed' 
at least five times. Allow at least 4 h for dialysis between 
buffer changes. 

{2) Gel Filtration—Faster Than Dialysis— Effectively 
Removes Most Hydrophilic and Hydrophobic Labeling 
Reagents. Gel exclusion chromatography separates con- 
jugates from excess noncovalently bound labeling reagent 
and other small molecular weight imptirities by selectivly 
adsorbing the small molecules, while allowing the larger 
protein conjugate molectdes to pass through the void space 
in the gel. This method is very fast and effective for 
pvirifying conjugates from both hydrophobic and hydro- 
philic labeling reagents. A common technique employs a 
Sephadex G-25 or similar column containing about a 2- 
mL bed volume/mg of protein that can be packed in any 
suitable buffer (30). Upon elutlon in the case of dyes, the 
conjugate and free dye bands are usually clearly visible; 
many other types of labels can be visualized by holding • 
a hand-held UV lamp close to the column during chro- 
matography. Automatic fraction-collecting devices with 
UV monitors are also frequently used. If partial precip- 
itation has occurred during the reaction, the samples 
should be centrifuged before running the column. The 
solution of labeled protein wiU contain a mixture of species 
with variable degrees of substitution. If required, sepa- 
ration of the lightly and heavily labeled fractions can be 
done by ion-exchange chromatography. Usually one 
passage through a gel nitration column is sufficient to 
remove most of the unreacted label; however, some proteins 
bind small molecules with high avidity. To completely 
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purify these conjugates it may be necessary to carry out 
additional puriHcation steps. 

(5) Hydrophobic Interaction Adsorbents — Removes 
Strongly Bound Hydrophobic Labeling Reagents, Some 
labeling reagents have a very strong £iffinity for certain 
proteins and cannot be completely removed by gel 
filtration. These conjugates c£ui be further puriHed (after 
gel titration to remove most of the unreacted label) by 
treatment with microporous, hydrophobic polystyrene 
beads (68). In this procedure, the conjugate is simply 
mixed with the beads, and the small hydrophobic molecules 
are selectively adsorbed into the micropores while the 
larger conjugate molecules are excluded. 

B. Removal of Labeling Reagent Attached by 
Unstable Covalent Bonds. (I) Hydroxylamine Treat- 
ment— Hydrolysis of Tyrosine Ester Bonds under Mild 
Conditions, Section HA.d describes the formation of ty- 
rosine esters. Several of the reagents commonly used for 
protein modification, including NHS esters, isothiocy- 
anates, and sulfonyl chlorides, can react with tyrosines to 
form these esters. These adducts are unstable and can 
hydrolyze even at physiological pH, resulting in loss of 
label over a period of time. Since any measurable loss of 
label can interfere with the intended use of many con- 
jugates, it is advisable to pretreat all conjugates prepared 
with these types of reagents to remove any esters that 
may have formed in the conjugation reaction. This can 
be effectively done in most cases by treating the conjugate 
before purification with hydroxylamine (69, 70). In this 
method, a 1.6 M solution of hydroxylamine at pH 8.0 is 
added to the conjugate solution to a final concentration 
of 0.1 M and the solution is stirred at room temperature 
for 1 h. The conjugate is then purified by gel filtration 
or dialysis. 

VI. EXPERIMENTAL METHODS FOR PREPARING 
PROTEIN CONJUGATES 

The general experimental procedures that follow de- 
scribe methods for conjugating amine-reactive and thiol- 
reactive probes to proteins. They should be useful as a 
guide for the experimentalist; however, it is strongly 
suggested that the numerous literature references given 
in this review and others be consvdted for additional specific 
, information. Because of the very wide variety of exper- 
imental conditions required for coupling proteins with 
bifunctional reagents, it is difficult to generate a simple 
general procedure and the reader is advised to consult the 
literature for specific procedures. 

A. Amine-Reactive Probes, The following general 
procedure is recommended for the first trial and is 
adaptable to amine-reactive dye, biotin, hapten, and 
bifunctional linker conjugations. The procedure may be 
modified after the degree of substitution has been deter- 
mined (see below) after purification. 

Step 1, Dissolve the protein at 50-100 in 50-100 
mM sodium bicarbonate buffer at pH 9.2 at room tem- 
perature. Borate buffer is also suitable. Amine-based 
buffers, such as TRIS are not recommended. Conjugations 
with succinimide esters and reagents such as DTAF [6- 
[(4,6-dichlorotria2in-2-yl)amino]fluorescein] should be 
done at a lower pH. In these cases, a suitable buffer is 
50-100 mM pH 8.2 sodium bicarbonate. 

Step 2. Add sufficient protein-modification reagent 
from a stock solution to contain about 10 mol of isothio- 
cyanate or succinimide ester for each mole of protein or 
about 20 mol of sulfonyl chloride for each mole of protein. 



Although most protein modification reagents have some 
solubility in water, it is recommended that a stock solution 
Tbe prepared immediately before use in a water-miscible 
nonhydroxylic solvent such as dimethyl formamide (DMF), 
dimethyl sulfoxide (DMSG), or dioxane. The stock 
solution should be prepared fresh each time, since it is 
very diffimlt to store these solutions for any length of 
time without decomposition of the reagent taking place. 
As a guideline, it is recommended to prepare a stock 
solution at about 10-20 mM of the protein-modification 
reagent in dry DMF. The fluorescent dyes Texas Red, 
Lissamine rhodamine B, and other sulfonyl chlorides must 
never be used in DMSO, with which they react These 
stock solutions (prepared in dry DMF) are usually diluted 
about 10-fold into the protein , while being agitated to avoid 
high local concentrations of reagent. Some reagents are 
quite hydrophobic, having little solubility in the aqueous 
protein solution. This is particularly true of some of the 
rhodamine and biotin succinimidyLesters. A technique 
that helps in these cases is to add a 20% volimie of DMF 
or DMSO slowly to the protein/buffer solution before 
adding the stock solution of the reagent in DMF or DMSO 
(see section IV.C.l). 

Isothiocyanates and Succinimidyl Esters. Add the 
solution of the modification reagent, dropwise using a 
microliter syringe during a period of about 1 min to the 
stirred protein solution while in an ice-water bath. Allow 
the. reaction mixture to warm to room temperature and 
continue to stir for at least 2 h. 

Sulfonyl Chlorides. Add the solution of the reagent 
quickly using a micropipet to the stirred protein solution 
in an ice bath or in a cold room. Allow to react at 4 *C 
for 1 h. 

Step 3. Separate the conjugate from unreacted dye on 
a gel filtration column using the appropriate buffer as 
described in section V. Texas Red and certain other 
rhodamine-based conjugates will still retein varying 
amounts of noncovalently adsorbed dye even after puri- 
fication by gel chromatography. This protein-adsorbed 
dye can be removed by treating the conjugate with a 
hydrophobic adsorbent as described in section V.A.3. 

B. Thiol-Reactive Probes. A general procedure 
Buiteble for conjugation of thiol-reactive probes, including 
maleimides, iodoacetotes, and alkyl halides, is outlined 
below. As a rule, thiol-reactive reagents are more steble 
to water than the reactive esters; however, they should be 
handled carefully and stored in a freezer with protection 
from light and moisture. As with the reactive esters and 
isothiocyanates discussed above, only freshly prepared 
reagent solutions should be used. Protection from light 
is particularly important for iodoaCetamides. 

Step 1. Dissolve the protein at 50-100 mM in a suitable 
buffer at pH 7.0-7.5 (10-100 mM phosphate, TRIS, 
HEPES) at room temperature. At this pH range, the 
protein thiol groups are sufficiently nucleophilic so that 
they react almost exclusively with the reagent in the 
presence of the more numerous protein amines, which are 
protonated and relatively unreactive. As a general rule, 
it is advisable to carry out thiol modifications in an oxygen- 
free environment, since some thiols can be oxidized to 
disulfides. This is particularly important if the modifi- 
cation reagent is to be reacted with a cystine group that 
has been previously reduced with a reagent such as dithio- 
threitol. In this case, all buffers should be deoxygenated 
and the reactions carried out under an inert atmosphere 
to prevent re-formation of disulfide. 
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Step 2. Add sufficient protein modification reagent 
from a stock solution of the reagent to contain 10-20. mol 
of reagent for each mole of protein. If the reagent is water- 
soluble, an aqueous solution can be used; otherwise, the 
reagent can be dissolved in one of the watei-miscible non- 
hydroxylic solvents recommended for use with amine- 
reactive reagents. The reagent concentration should be 
about 10-20 mM. Upon completion of the reaction with 
the protein, an excess of glutathione^ mercaptoethanol, or 
other soluble low molecular weight thiol can be added to 
consume excess modification reagent, thus ensuring that 
no reactive species are present during tbe purification step. 

lodoacetamides. Reactions with iodoacetamides should 
be caried out in the dark, since light can cause reagent 
decomposition. Add the stock reagent solution dropwise 
and slowly to the gently stirred solution of the protein at 
room temperature over a period of about 1 min. Ck>ntinue 
stirring for 2 h. 

Maleimides, Reaction conditions are essentially the 
same as with iodoacetamides; however, the selectivity of 
maleimides toward thiol groups is greater, allowing some- 
what more latitude in the buffer pH. Decomposition to 
maleamic acids above pH 8.0 is a competing reaction. Add 
the stock reagent solution dropwise and slowly to the gently 
stirred protein solution at room temperature over a period 
of about 1 min and allow the mixture to react for 2 h. 

Step 3. Separate the conjugate from unreacted mod- 
location reagent as described in section V. 

C. Storage of Conjugates. Conjugates should be 
stored as one normally stores the parent protein. If the 
protein is stable to freezing, then lyophilization is rec- 
ommended for long term storage. Sodium azide at 2 mM 
or thimerosal may be added to inhibit bacterial growth. 
CAUTION: These preservatives may be toxic in live-cell 
use of conjugates. In addition, sodium azide is an inhibitor 
of the enzyme horseradish peroxidase (HRP). Therefore, 
thimerosal should be substituted as a preservative in 
situations where the conjugate is derived from HRP or it 
is anticipated that the conjugate will be used in the 
presence of HRP. Fluorescent dye conjugates should be 
protected from light. 

VII. DETERMINATION OF THE DEGREE OF 
SUBSTITUTION OF PROTEIN CONJUGATES 

Several methods are available for determining the degree 
of substitution of modified proteins. If the modification 
results in the creation of thiol residues, as is often the case 
with bifunctional reagents, it is relatively straightforward 
to determine the degree of substitution by quantitation 
of thiols. Several colorimetric methods for thiol deter- 
mination are available (43, 45, 47), Maleimides introduced 
into proteins can be determined by back-titration with 
2-mercaptoethanol {81). Dyes and many other types of 
molecules introduced into proteins are usually determined 
by spectroscopic techniques, as described below. 

This general procedure should be applicable to dyes 
and other molecules that have significant absorption above 
-280 nm. 

The determination of dye/protein (D/P) levels by 
spectroscopy is accomplished by determining the apparent 
concentration of dye in the conjugate by measuring its 
absorption at its characteristic Xjata and then measuring 
the protein concentration of the conjugate by its absorption 
at 280 nm. Because most dyes have some absorption at 
280 nm, the absorption of the conjugate at 280 nm must 
be corrected for the contribution of the dye to obtain the 
correct protein concentration. The ratio of these two 



concentrations, calculated by use of Beer's law (A = eC/, 
where e = extinction coefficient, A = molar absorbance, 
C = molar concentration, and / = path- length), is then 
equal to the D/P ratio. 

This method is inexact, because there is no way, to know 
precisely how the spectral characteristics of the dye change 
when it is conjugated to the protein. The following 
assumptions and approximations are made. 

(1) The extinction coefficient of the protein-bound dye 
at its absorption maximum is about the same as the 
extinction coefficient of the free dye in solution at its 
absorption maximim:i. Although there are undoubtedly 
some differences, experiments have shown that this 
assumption is at least approximately correct (64). 

(2) The absorption of the protein-boimd dye at 280 nm 
is about the same as the absorption of the free dye in 
solution. This assumption may be less reliable than the 
previous assumption, since there is probably more con- 
tribution firom the linking group to this portion of the 
spectrum, and this group can be substantially changed 
when attached to the protein. The following question 
arises: what is the **free dye"? There is no unambiguous 
answer to this question, since the dye, when attached to 
the protein, is different than the free dye, and the spectral 
properties will be somewhat different. The best choice of 
free dye if the NHS ester was used as the reagent is 
probably the free acid or lysine amide derivative. These 
may be available or can be synthesized. Do not use the 
NHS ester as the free dye, since the N-succinimidyl group 
absorbs strongly at 280 nm. In other cases, sulfonic acids 
can be used when the protein modiHcation reagent was a 
sulfonyl chloride. 

(3) The extinction coefHcient of the conjugate at 280 
nm is about the same as the extinction coefHcient of the 
native protein. However, extensive modification of the 
protein may change the spectral absorption at 280 nm in 
an unknown manner. 

Although there are obvious questionable assumptions, 
spectroscopy remains the easiest and most convenient 
method of determining D/P ratios. One alternative is to 
determine the protein , concentration by weighing the 
conjugate, which eliminates problems in assumption 3, 
but this is tedious and includes the danger that the 
conjugate will denature when dried without buffer, or the 
lyophilized conjugate may contain entrapped buffer salts. 
This method does not eliminate errors from assumptions 
1 and 2. Another alternative is to digest a known amount' 
of the conjugate chemically or with a proteolytic enzyme 
to degrade the molecule to small fragments containing 
the dye and then determine the concentration of the dye 
by spectroscopy. This is even more tedious and still does 
not usually give a pure dye product which can be compared 
spectrally with a known derivative. Becaxise of the lack 
of convenient and suitable alternatives, direct spectro- - 
scopic determination is the most frequently used method 
of estimating D/P ratios {64, 71-74), 

Procedure. Step J. Obtain absorption spectra of the 
free dye and the dye-protein conjugate (note 1). 

Step 2. Obtain extinction coefficients of the free dye 
and protein from a handbook of dyes and protein tables 
(5, 50), 

Step 3. Perform these calculations: 
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Cp = [A280 - iA^)Vt^ 
D/P = Ca/Cp 

where ea is the extinction coefficient of free dye at Xmaz* 
is the absorbance of free dye at Xmoxi >^d(280) is the ab- 
sorbance of free dye at 280 nm, is the absorbance of 
dye in conjugate at Xjou, Cp Is the extinction coefficient of 
protein , at 280 nm, A280 is the absorbance of protein in 
conjugate at 280 nm, Cd is the concentration of dye in 
conjugate (mol/L), and Cp is the concentration of protein 
in conjugate (mol/L). 
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Abstract 

The serine protease subtilisin is an innportant industrial enzyme as well as a model for understanding the enormous rate 
enhancements affected by enzymes. For these reasons along with the timely cloning of the gene, ease of expression and 
purification and availability of atomic resolution structures, subtilisin became a model system for protein engineering studies 
in the 1980s. Fifteen years later, mutations in well over 50% of the 275 amino acids of subtilisin have been reported in the 
scientific literature. Most subtilisin engineering has involved catalytic amino acids* substrate binding regions and stabilizing 
mutations. Stability has been the property of subtilisin which has been most amenable to enhancement, yet perhaps least 
' understood. This review will give a brief overview of the subtilisin engineering field, critically review what has been learned 
i about subtilisin stability from protein engineering experiments and conclude with some speculation about the prospects for 
I future subtilisin engineering. © 2000 Elsevier Science B.V. All rights reserved. 

i 
I 

j Keywords: Folding; Stability; Site-directed mutagenesis; Design; Directed evolution 

t 

1. Overview 

In March of 1985, the first UCLA Symposium on 
Protein Structure, Folding and Design convened in 
Keystone Colorado [105]. The atmosphere reflected a 
distinct giddiness among many of us about the pros- 
pects of the newly anointed field of 'Protein Engi- 
neering' [170]. The meeting was timely because in 
the early 1980s a number of technical breakthroughs 
came together which enabled the introduction of spe- 
cific mutations into a gene, heterologous expression 
of the altered protein, and relatively rapid assessment 
of the structural consequences of the mutations 
by X-ray structure determination. In the keynote 
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address, however. Frederick Richards of Yale Uni- 
versity asserted that while site-directed mutagenesis 
was fun, it was really just the next phase of chemical 
modification and unlikely to revolutionize under- 
standing of protein folding and enzymology. After 
15 years and thousands of site-directed mutants, it 
probably can be said that a good time has been had 
by all. But given the perspectives of time and expe- 
rience, what has been accomplished from protein en- 
gineering? This review will give a brief overview of 
the subtilisin engineering field, critically review what 
has been learned about subtilisin stabihty from pro- 
tein engineering experiments and conclude with some 
speculation about the prospects for future subtilisin 
engineering. 

Mutations in well over 50% of the 275 amino acids 
of subtilisin have been reported in the scientific liter- 
ature (Table 1). Many more examples exist in the 
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patent literature and undoubtedly still more lurk 
unfathomed in the freezers of biotechnology compa- 
nies. Subtilisins constitute a large class of microbial, 
serine proteases, but the ones most mutagenized are 
those secreted from the Bacillus species amyloliquefa- 
ciem (BPN'), subtilis (subtilisin E) and lentus (savi- 
nase). Subtilisins are important industrial enzymes as 
well as models for understanding the enormous rate 
enhancements affected by enzymes. For these rea- 
sons, along with the timely cloning of the gene, 
ease of expression and purification and availability 
of atomic resolution structures, subtilisin became a 
model system for protein engineering studies. 

Protein engineering of subtilisin commenced in the 
mid 1960s when the active site serine 221 was con- 
verted to cysteine through chemical modification 
[101,119]. As it turned out, this first alteration re- 
mains one of the most useful. C221 subtilisin is cata- 
lytically wounded to the point that it will barely hy- 
drolyze peptide bonds but turns out to be quite 
reactive with certain activated ester substrates 
[115,116]. This combination of properties has made 
it a useful tool for catalyzing synthetic reactions. 
These include condensation of amino acids to form 
peptides and transesterification reactions such as re- 
gioselective acylation of sugars [83,98.187.188]. 

The first genetic modifications in subtilisin oc- 
curred rapidly after the gene was cloned in the early 
1980s [72,171,182]. The early standard for genetic 
manipulation was subtilisin BPN', which was engi- 
neered for stability [26,47,183], catalytic mechanism 
[20,168,180] and substrate specificity [46]. The ration- 
ales for modifying subtilisin have expanded over the 
years to include the following eight broad classifica- 
tions: 

(1) Catalytic mechanism: [15,20.31.32,36,41,97, 
101,102,104,119-121,129,130,147,148.168.169,178,180, 

185]. 

(2) Substrate specificity: [5,6,8.9.28-30.38^0,46, 
56^58,85,89-91,94,122,123,144,155,156.161.163-165, 

167,179,181,184]. 

(3) New activities: [1,3,10,11,60-63.79.114,117, 
134,152,193]. 

(4) General proteolytic activity: [54.77.153,154, 
157,159]. 

(5) General stability: [4,16,22.23, :5-:7.34,35,45, 
48,53,65,74,75,78.80,95,96,99,100.107.110-112,124, 

132,145,158,160,166,183,190,191.194]. 



(6) Stability in exotic environments: [33,47,55,109 
149,174,186], 

(7) Surface activity: [17,18,44,69]. 

(8) Folding mechanisms: [19,21,24,42,43,49-51 
67,68,73,82,86-88,127,128,131,133,138-142,150,151/ 
172,176,177]. 

Most subtilisin engineering continues to involve 
catalytic amino acids, substrate binding regions and 
stabilizing mutations. Included in the active site cat- 
egory are mutations of the catalytic triad (D32, H64, 
S221), the oxyanion hole (N155) and mutations 
which influence ^K^ of H64 through long range elec- 
trostatics. Most mutations affecting specificity have 
been made in the binding pockets SI and S4 [12], 
The SI amino acids comprise positions 127, 152, 
154, 156 and 166 and the S4 amino acids comprise 
positions 102, 104, 107, 126 and 128. A excellent re- 
view of the use of protein engineering to understand 
catalytic mechanism and substrate specificity ap- 
peared in 1995 [113]. 



2. Subtilisin stability 

Stability has been the property of subtilisin which 
has been most amenable to enhancement, yet per- 
haps least understood. Rationalizing stability in- 
creases resulting from mutation in structural and en- 
ergetic detail is limited by the inability to study the 
folding reaction under equilibrium conditions. The 
most basic protein stability experiment is determin- 
ing the free energy of unfolding [70,162]. This ques- 
tion is still not resolved for subtilisin. Biosynthesis of 
subtilisin requires participation of an N-terminal 
prodomain [71], The folding rate of mature subtilisin 
without the prodomain occurs on a time scale closer 
to geological than biological. By combining biochem- 
ical analysis with information from mutagenesis ex- 
periments, however, one can now make an informed 
estimate of the free energy of folding mature subtili- 
sin and use this information to better evaluate stabi- 
lizing mutations. 

2,/. Energetics of the subtilisin folding reaction 

2, 1. 1. Calcium binding 

A fundamental variable to address in subtilisin 
stability is its colossal calcium dependence [52,175]. 
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Table 1 {continued) 




C- — 


BPN' 


Mutatinn 


No. 


BPN' 


Mutation 




A 


C W/C78 [1081 


52 


P 






Q 


R [23]: E. R, W [149] 


53 


S 


T [124] 


V 3 


s 


C W/C206M75-83 [149]; T [3] 


54 


E 




V 


I [53] 


55 


T 




'y': 5 


p 


A. S W/A75-83 [149] 


56 


N 




• 6 


Y 




57 


P 




7 


G 




58 


F 




8 


V 


I [25] 


59 


Q 


R 


; 9 


s 


F [!9l] 


60 


D 


N Csubt E) [33.194] 


10 


Q 




61 


N 


C W/C98 [160] 


It 


I 




62 


N 


D [5]; CMM [36] 


12 


K 




■ 63 


S 


D{25] 


■13 


A 




64 


H 


A [31] 


14 


P • 


L[191] 


65 


G 




15 


A 


K 


66 


T 




16 


L 




67 


H 


Y, A [3] 


; 17 


H 




68 


V 


C[7] 


> 18 


S 




69 


A 




V.I9 


Q 


E[45] 


70 


G 


A, S W/A75-83 [149] 


.^20 


G 




71 


T 


V[53] 


21 


Y 




72 


V 


1 [153] 


. 22 


T 


C W/C87 [110,183] 


73 


A 


L, H W/A75-83 [149] 


23 


G 




74 


A 




24 


S 


C w C87 [110,183] 


75 


L 


Deletion 75-83 [19] 


25 


N 




76 


N 


D [99,111,174.191] 


26 


V 


C w/235 fl08]; C \v/232 [95]; R [45] 


77 


N 


D [45] 


27 


K 


C W/C89 [108]; R [54,65] 


78 


s 


C w/Cl [108]; D [25] 


28 


V 




79 


I 


T 


29 


A 


C \v/Cn9 [95] 


80 


G 


C W/C41 [95] 


30 


V 




81 


V 




31 


I 


L [157] 


82 


L 




32 


D 


N, A [31]; N [51] 


83 


G 




33 


S 


D. E [5] 


84 


V 




34 


G 




85 


A 


C w/232 [108] 


35 


I 




86 


P 




36 


D 


Q [148]; C VV/C210 [95]; insertion of D 
(savinase) [174] 


87 


s 


C w/22 and 24 [110,183]: S (savinase) 
[54] 


37 


S 


[191] 


88 


A 




38 


S 




89 


s 


E [45]: E89S (savinase) [65] 


39 


H 






T 




40 


P 




91 


Y 

1 




41 


D 


C \v/C80 [95]; Q. A w/A75-83 [149] 




A 


T [153] 


■42 


L 




9"^ 


V 


I [190] 


43 


K 


N [134]; N. R, w/A75-83 [149] 


94 


K 




44 


V 




95 


V 




45 


A 


Replacement 45-63 with thcrmitase 


96 


L 








sequence [16] 


97 


G 


D97G (subt E) [33] 


46 


G 




98 


A 


K [45]: C W/C61 [160] 


47 


G 




99 


D 


S, K [147] 


48 


A 




100 


G 


A, V, L [164] 


49 


S 


D, R [75]; P [65] 


101 


S 


H. K, E [165] 


SO 


M 


F [35,111] 


102 


G 


F[9] 


51 


V 


K [45] 


103 


Q 


R [33.194]: A [54] 
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104 

105 
106 
107 

108 
109 
110 
111 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 



S 

W 

I 

I 

N 

G 

I 

E 

W 

A 

I 

A 

N 

N 

M 

D 

V 

I 

N 

M 

S 

L 

G 

G 

P 

S 

G 

S 

A 

A 

L 

K 

A 

A 

V 

D 

K 

A 

V 

A 

S 

G 

V 

V 

V 

V 

A 

A 

A 

G 

N 



A. R, D. F, S, W, Y [8]; W [167); A, 
F [122,123]; V [174]: D [6]; I [54] 



V [35]; G. A, V. L, F [144]; G, A, V 
[123] 

S [99,190] 



T, E [124] 

S [34,191,194] 

C W/C29 [95] 

H120D (savinase) [174] 

C w/C 147 [108] 
S [54] 
L. I [3] 
A, G [3] 

I [124]; A. F [144]; G, A. V [123] 
A. S, V [156] 

F [9]: S12SG (snvinase) [174] 
F [9] 

D [33.124.166]: H, K [165] 
F [9] 



A, V, F [144] 



C W/C122 [lOS] 
C w/243 [95] 



C. S [3] 

A. R. L. F. P. T [161] 

L [20]: A. L. H. Q. R [180] 
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156 

157 
158 

159 
160 
161 
162 
163 
164 
165 
166 



G 
T 

S 
G 
S 
S 

s 

T 
V 
G 



1 £.1 

167 


I 


168 


P 


169 




170 


vr 
K. 


171 


Y 


172 


P 


173 


S 


174 


V 


175 


1 


176 


A 


177 


V 


178 


G 


179 


A 


180 


V 


181 


D 


182 


S 


183 


S 


184 


N 


185 


Q 


186 


R 


187 


A 


188 


S 


189 


F 


190 


S 


191 


S 


192 


V 


193 


G 


194 


P 


195 


E 


196 


L 


197 


D 


198 


V 


199 


M 


200 


A 


201 


P 


202 


G 


203 


V 



Q, S [184]: S, K [147]; G [33); CMM 
[36] 

158-165 replacement with thermitase 
sequence [16] 



C [191]; deletion 161-164 [155] 



R [45]: SI64T (savinase) T [53] 
C W/C191 [108] 

A, S, C, T. P, V, L. I, F, Y, W [46]; 
D, E, Q. M. K, R [184]; S [124]; D (S); 
R [191]: CMM [36] 



A [111,181] 
Y, L, M [65] 

D, E [112] 



S [33]; N [134]; D [190] 
G [33] 



P [33.124.132] 



C W/C165 [IDS] 
T [191] 

S194P (subt E) [65.191]: A194P 
(savinase) [174] 
Q [74]: Q, E. D, F, M. K, R (savinase 
[65,174] 

N [65.166] 



K [17] 



■ fgble 


1 (rnniini 

1 ^ L V' I ( 1 rl 1 


-204 


s 




I 


206 


Q 




s 


208 


T 


209 


L 


210 


P 


211 


G 


212 


IN 


213 


K 


214 
* • 


Y 


215 


G 


216 


A 


217 


Y 


218 


N 


219 


G 


220 


T 


221 


S 


222 


M 


223 


A 


224 


• S 


225 


P 


226 


H 


227 


V 


228 


A 


229 


G 


230 


A 


231 


A 


232 


A 


233 


L 


234 


1 

» 


235 




236 


s 


237 


K 


238 


H 


239 


P 


240 


N 


241 


W 


242 


T 


243 


N 


244 


T 


245 


0 


246 


V 


247 


R 


248 


s 


249 


s 


250 


L 


251 


E 


252 


N 


253 


T 


254 


T 
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S 


F(17] 


205 


r 
1 


V205I (savinase) [53] 


206 


Q 


C (1 1 IJ; C W/C3/A75-83 [149]; N, D, 




Y, E, K, I, F, L, W [17] 


207 


s 




208 


T 




209 


L 


FI17] 


210 


P 


C W/C36 [95] 


211 


G 


K. P. L. W [96] 


212 


N 


P, A, V, S [96] 


213 


K 


R [35]; T [147] 


2H 


Y 


K \vM75-83 [149] 


215 


G 




216 


A 


E [17] 


217 


Y . 


L [181]; K [1 1 1]; W [134]; CMM [36] 


218 


N 


S. T. A, C D, W [26]: S [99.1 1 !J90]; 






M [17]; T, A, H [3] 


219 


G 




220 


T 


A [15] 


221 


S 


C [1.101,119]; A [31]; seleno [10] 


222 


M 


All [47]; Me-S-C [55); A [134,194]; G, 






S. A. V, F [3] 


223 


A 


S[3] 


224 


S 


A, C [3] 


225 


P 


A [I): G [3] 


226 


H 




227 


V 




228 


A 




229 


G 




230 


A 




231 


A 




232 


A 


C W/C85 [108]; C w/C26 [95] 


233 


L 


234 


1 




235 


L 


R [45]: K235L (savinase) [174] 


236 


S 


237 


K 




238 


H 




239 


P 


G K R fl5Sl 


240 


N 




241 


VV 




242 


T 




243- 


N 


C W/C148 [95] 


244 


T 


245 


Q 




246 


V 




247 


R 




248 


S 


N. A. L [66] 


249 


s 


C W/C273 [108] 


250 


L 


251 


E 


E[651 


252 


N 




253 


T 


C W/C273 [108] 


254 


T 


A [124] 
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255 


T 


A [33] 


256 


K 


Y [134] 


257 


L 




258 


G 




259 


D 




260 


s 




261 


F 




262 


Y 




263 


Y 




264 


G 




265 


K 




266 


G 




267 


L 




268 


I 




269 


N 


D 


270 


V 




271 


Q 


E [2,45]; G [65] 


272 


A 




273 


A 


C W/C249 or C253 [108] 


274 


A 


A (savinase) [54] 


275 


Q 





A universal feature of subtilisins is the presence of 
one or more calcium binding sites. High resolution 
X-ray structures of subtilisin BPN', as well as several 
homologues [13,14,59,93], have revealed details of a 
conserved, calcium binding site, termed site A. Cal- 
cium at site A is coordinated by five carbonyl oxygen 
ligands and one aspartic acid. Four of the carbonyl 
oxygen ligands to the calcium are provided by a loop 
comprising amino acids 75-83. The geometry of the 
ligands is that of a pentagonal bipyramid whose axis 
runs through the carbonyls of 75 and 79. On one side 
of the loop is the bidentate carboxylate (D41), while 
on the other side is the N-terminus of the protein 
and the side chain of Q2. The seven coordination 
distances range from 2.3 to 2.6 A, the shortest being 
to the aspartyl carboxylate. Three hydrogen bonds 
link the N-terminal segment to loop residues 78-82 
in parallel-P arrangement. 

Because of the marginal stability of subtilisin 
without calcium bound, the energetics of calcium 
binding at site A are difficult to study indepen- 
dently of the unfolding reaction. By employing an 
inactive and stabilized version of subtilisin, the cal- 
cium-free (apo) form of subtilisin can be produced 
and calcium binding measured by microcalorim- 
etry and fluorescence spectroscopy [19]. The binding 
parameters obtained by titration calorimetry are 
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Ai/=-ll kcal/mol and i:a = 7XlO^ M"* at 25^C. 
The standard free energy of binding is 9.3 kcal/mol, 
so the binding of calcium is primarily enthalpically 
driven with only a small net loss in entropy 
(ASbinding = ~6-7 cal/®mol). This is surprising since 
transfer of calcium into water results in a loss of 
entropy of —60 cal/^mol. Therefore the freeing of 
water upon calcium binding to the protein will 
make a major contribution to the overall AS of the 
process. The gain in solvent entropy upon binding 
must be compensated for by a loss in entropy of 
the protein. Presumably, the loop amino acids 75- 
83 and the first few N-terminal residues have in- 
creased mobility when calcium is absent from the 
A site. 

o 

A second ion binding site (site B) is located 32 A 
from site A in a shallow crevice between two seg- 
ments of polypeptide chain near the surface of the 
molecule. The coordination geometry of this site 
closely resembles a distorted pentagonal bipyramid. 
Three of the formal ligands are derived from the 
protein and include the carbonyl oxygen atom of 
El 95 and the two side chain carboxylate oxygens 
of D197. Four water molecules complete the first 
coordination sphere. Evidence that site B binds cal- 
cium comes from determining the occupancy of the 
site in a series of X-ray structures from crystals 
grown in 50 mM NaCl with calcium concentrations 
ranging from 1 to 40 mM [112]. In the absence of 
excess calcium, this locus was found to bind a so- 
dium ion. The binding of these two ions appears to 
be mutually exclusive so that as the calcium concen- 
tration increases, the sodium ion is displaced, and a 
water molecule appears in its place directly coordi- 
nated to the bound calcium [112]. Analysis of occu- 
pancy vs. calcium concentration indicates that is 
approx. 40 M~'. 

2.1.2: Calcium-independent stability 

Subtilisin does not refold to the native state on an 
observable time scale except under conditions which 
make direct measurements of the equilibrium con- 
stant for folding impractical [64]. Site-directed muta- 
genesis afforded an opportunity to simplify the sub- 
tilisin folding reaction and test whether a calcium- 
free mutant subtilisin might fold more readily than 
the wild type protein. The calcium binding loop is 
formed from a nine amino acid bubble in the last 



turn of a 14-residue a-helix involving amino aci(ij 
63-85 [93]. Deleting amino acids 75-83 creates ^ 
uninterrupted helix and abolishes the calcium bin^J. 
ing potential at site A [2,19]. The X-ray structure \^ 
shown that except for the region of the deleted ca|, 
cium binding loop, the structure of the mutant 
wild type protein are remarkably similar considering 
the size of the deletion. The structures of subtilisjjj 
with and without the deletion superimpose with aa 
rms difference between 261 Ca positions of 0,17 A 
The N-terminus of the wild type protein lies beside 
the site A loop, furnishing one calcium coordination 
ligand, the side chain oxygen of Q2. In A75-83 sub- 
tihsin, the loop is gone, leaving residues 1-4 disor* 
dered, but the helix is uninterrupted and shows nor- 
mal helical geometry over its entire length. 

The folding rate of A75-83 BPN' is much 
faster than BPN'. Although it is hard to com- 
pare their folding rates under similar conditions 
[64,92], it is certain that A75-83 BPN' folds at least 
10^ times faster than BPN' in 0.1 M KPi, pH 7.0. 
The unfolding rates of the apo form of BPN' 
and A75-83 BPN' are very similar [19]. Since 

AGunroidine = — RT ln(A'unfoiding/A:foiding) in a two state 
system, the simplest interpretation of the unfolding 
and refolding rates would mean that AGunfoiding for 
A75-83 BPN' is at least 5.5 kcal/mol greater at 25T 
than for apo BPN'. Recent H-D exchange data in- 
dicate that the total AGunfoiding for A75-83 BPN' is 
approx. 7 kcal/mol in 0.1 M KPj, pH 7.0 and 
(unpublished data). This would mean that apo BPN' 
is near the margin of thermodynamic stability. 

2 J. 3. Calcium-dependent stability 

In view of the marginal stability of apo-subtilisin, 
it is evident that calcium binding makes a dominant 
contribution to conformational stability. By binding 
at a specific site in the tertiary structure, calcium 
contributes its binding energy to the stability of the 
native state and contributes to the overall free energ)" 
of folding. The unfolding reaction of subtilisin BPN 
can be divided as follows: 

N(Ca2) «'N(Ca) + Ca'i^'N + 2Ca 

where NCCa:) is the native form of subtilisin wiili 
calcium bound to both sites, N(Ca) is the native 
form of subtilisin with calcium bound to site A, ^ 
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Ifis the folded apoprotein and U is the unfolded pro- 
l^ein. The total free energy of unfolding is therefore 
|cqual to A^,+A^3+A^3. From the binding constant, 
bi one can calculate the contribution of calcium to the 
Upcc energy of subtilisin folding from the equation: 
fiGhin^ms = "RT ln(l -h K^Ca]) 

;;j:Thus the contribution of site A to the stability of 
•'subtihsm m 10 mM calcium is 6.6 kcal/mol at 
25'C. The contribution of calcium binding to site B 
l in 10 mM calcium and 50 mM sodium is only 0 2 
^ kcal/mol. This analysis is at odds with earlier studies 
.:;which concluded that calcium binding to site B is 
.' responsible for the large decrease in the inactivation 
Kiate of subtihsin in the presence of millimolar con- 
• centrations of calcium [16,112]. As shown below, re- 
! . examination of calcium-dependent stability data in 
light of a better understanding of the energetics of 
;v subtihsin folding shows that site B has relatively little 
* effect on subtilisin stability in the presence of mod- 
} erate concentrations of monovalent cations. 

Kinetics of irreversible inactivation 

^ In most protein engineering studies of subtilisin 
stabihty is defined in terms of the loss of activity as 
a function of time. The mechanisms of irreversible 
inactivation can be complex involving unfolding, au- 
todigestion, aggregation and chemical damage to cer- 
tain amino acids. If one wishes to understand stabil- 
ity by this definition, the rate determining step in 
inactivation under the specified condition must be 
determined. For example, subtilisin can be inacti- 
vated with hydrogen peroxide due to the oxidation 
of the methionine next to the active site serine [146J. 
If this occurs, it is irrelevant to activity whether the 
enzyme remains folded or not. It is also clear that 
autodigestion will become a relatively more impor- 
tant mechanism of inactivation at high concentra- 
lons of enzyme because it is a second order reaction, 
in general, however, studies which measure the rate 
of mactivation at elevated temperature are indirectly 
measuring the rate of unfolding because unfolding 
becomes the rate determining step in irreversible in- 
activation as temperature is increased. This can be 
seen by directly comparing the rate of unfolding of 
subulisin BPN' using calorimetric measurements 
With the rate of inactivation under the same condi- 
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c 




0.00286 0.00288 



0.0029 0.00292 0.00294 0.00296 

l/'K 

Fig. 1. Comparison of the rates of irreversible thermal inac.iva- 

°l ^'"^ of thermal unfolding in 50 

mM Tr.s-HCl, pH 8.0. 50 mM NaCl. 10 mM CaCN. over the 
temperature range of 65-75'C. Unfolding rates are' measured 
by differential scanning calorimeiry. Data arc plotted as the 
natural logarithm of the rate constants vs. 1/»K. Solid circles 
show the rate of unfolding and open circles show the rate of in- 

TTT 7"' ^"^^gy or both processes is appro.x. 

80 kcal/mol at 65''C. 



tions (Fig. 1). Hence changes in rate of irreversible 
inactivation at elevated temperatures resulting from 
mutation are reflecting a change in activation energy 
for unfolding. 

Stabilizing mutations in subtilisin characterized by 
changes in the kinetics of inactivation can be classi- 
fied into three groups: (1) stabilizing only in calcium, 
(2) stabilizing only in chelants, and (3) stabilizing in 
both conditions (Table 2). From this partitioning it is 
evident that the mechanism of thermal inactivation 
differs depending on whether the calcium sites are 
occupied. To understand why this is so, one must 
understand how the kinetics of inactivation are re- 
lated to the kinetics of unfolding and how the ki- 
netics of unfolding are related to the kinetics of cal- 
cium loss. 

2.2. L Inactivation in excess EOT A 

Thermal inactivation in EDTA is a two step pro- 
cess as shown in mechanism 1 : 

N(Ca)-f EDTA<oNH-Ca : EDTA=*U=>I (1) 

Fig. 2 compares the rate of calcium dissociation with 
the rate of unfolding as a function of temperature for 
an inactive variant of subtilisin BPN' [19]. Reparti- 
tioning of calcium from site A into a strong chelator 
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Table 2 



Table 2 (continued) 



Stabilizing mutants in calcium 



BPN' 


10 mM CaCN 


10 mM 


EDTA 




V8I 


2.0 


0.8 




[25] 


S63D 


1.1 


0.6 




[25] 


GI31D 


1.5 


0.9 




[1241 


G169A 


5.9 


1.1 




[111] 


LI 261 


1.4 


1.1 




[124] 


A116E 


1.3 


1.0 




[124] 










f 1 1/11 

[124] 






1 0 




i*-^l 


S188P 


t.8 


1.0 




[124] 


P172D 


1.5 


1.1 




[112] 


T234A 


2.0 


1 .0 




[124] 


Ni09S 


+ 






(99) 


loop 45-63 


10.0 






[16] 


RPN' 










Q19E/Q271E 


2.0 






[45] 


N77K 


1.3 






[45] 


BPN' 


50 iiM calcium 

* 








K256Y 


6.6 






[134] 


Subtilisin E 


1 mM calcium 








9F 


1.4 






[191] 


PI4L 


1.5 






[191] 


N76D 


1.6 






{1911 


N118S 


+ 






[191] 


S161C 


3.0 






[191] 


G166R 


2.0 






[191] 


NI81D 


3.0 






~ noil 
{\J\\ 


S194P 


7.0 (P in BPN') 






f 1 oil 


N218S 


2.7 






r 1 Q 1 1 


subtilisin E 


1 mM calcium 








C6I-C98 


2.3 






[160] 


Stabilizing mutations in calcium 
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[65] 
[65] 
[65] 
[65] 
[65] 



occurs at a rate 5 h~' at 45°C. The kinetic barrier to 
calcium removal is 23 kcal/mol. Calcium is a integral 
part of the subtilisin structure and its association or 
dissociation requires significant but transient disrup-. 
tion in surrounding protein-protein interactions. This 
disruption in structure would explain the high acti- 
vation energy and slow kinetics of calcium binding 
and dissociation. For example, breaking main chain 
hydrogen bonds between the N-lerminal region and 
the 75-83 loop region would allow the relatively 
buried calcium a passageway into or out of the pro- 
tein. Global unfolding in 10 mM EDTA at 45*»C is 
much slower than calcium dissociation, however, oc- 
curring at a rate of 0.04 h*"*, with an activation en- 
ergy of approx. 60 kcal/moL Thus the predominant 
mechanism of inactivation in EDTA is calcium dis- 
sociation followed by unfolding and loss of activity. 

Because calcium binding reaches equilibrium 
quickly relative to the rate of unfolding, mutations 
which stabilize in EDTA must stabilize apo-subtili- 
sin. Increasing the binding constant for one of the 
calcium sites would not help unless the increase in 
binding affinity were enormous. Consider a typical 
experiment in which 1 mM EDTA is added to 100 
^ig/ml subtilisin (3.6 .|iM) bound to a stoichiometric 
amount of calcium. The calcium will partition be- 
tween subtilisin and EDTA according to the equa- 
tion: 

[SCa]/[S,ou,i] = /Cs-Ca[S]/{l -f A's-ca[S] + K^-c.m 

where [SCa]/[Stotai] is the fraction of subtilisin bound 
to calcium, [S]'^ total subtilisin and [E]^ total 
EDTA. Since the binding constant of subtilisin for 
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pig. 2. Comparison of ihe rates of calcium dissociation in ex- 
cess fluorescent chelator (qiiin2) with the rate of thermal un- 
folding, for the inactive subtilisin mutant, Sll [19]. The activa- 
tion energies are 23 kcal/mol for calcium dissociation in quin2 
and 60 kcal/mo! for unfolding in 50 mM Tris-HCl. pH 8.0, 50 
oM NaCl, 10 mM EDTA. at 45^*0. Data arc plotted as the 
natural logarithm of the rate constants vs. \rK. Solid circles 
• jhow the rate of unfolding and closed circles show the rate of 
calcium dissociation. 

calcium at site A (As-ca) = 7x10^ M"' and the bind- 
ing constant of EDTA for calcium {A:E_ca) = 2x 10^ 
M"', then less than 0.02% subtilisin would be bound 
to calcium at equilibrium. Examples of mutations 
which stabilize apo-subtilisin are M50F and the di- 
sulfides C22-C87 and C206-C216. The irony is that a 
mutation which preferentially stabilizes apo-subtilisin 
relative to the bound form, will weaken calcium 
binding and catalyze inactivation under conditions 
of excess calcium and high temperature (see mecha- 
nism 2 below). This phenomenon is displayed in the 
M50F mutant, which is more stable than wild type in 
10 mM EDTA but less stable in 10 mM CaCN (Ta- 
ble 2). 

^.2.2. Inactivation in excess calcium 

The inactivation of subtilisin in excess calcium is 
(diagrammed in mechanism 2: 



Ka (site B) 
N(2Ca) <=> N(Ca) + Ca 



Ka (site A) 

<=> N +2Ca 

U 

^ >>k3 



In excess calcium (e.g. ^1 mM) and moderate tem- 
perature, calcium binding and dissociation is in rapid 
equilibrium because calcium binding is much faster 
than unfolding. The rate of inactivation is deter- 
mined by the fraction of each native species times 
its unfolding rate. Using mechanism 2, one can 
show that calcium dependent stabilization of subtili- 
sin is dominated by site A rather than site B. Fig. 3 
plots the rate of inactivation of BPN' at 65**C as a 
function of calcium concentration and fits the data to 
the following mechanism: 



N(Ca2) 
11 0.0035 s-i 
U 



33 M-» 

<=> N(Ca) +Ca 
^ 0.0085 s-l 
U 



2.5x105 M-^ 



<=> N +2Ca 
a 8.7 s-» 
U 

il >25s-> 
I 

The mechanism predicts that K,, values of site A and 
site B are 2.5 x 1 0^ M"' and 33 M"* at 65**C. The 
rate of inactivation of subtilisin with only site A 
occupied (NCa) is about 1000 times slower than 
apo-subtilisin (N) and the rate of inactivation with 
both sites occupied (NCa2) is about 2.5 times slower 
than with only site A occupied. The second predic- 
tion has been borne out by measuring the calcium 
dependent stability of a mutant which has site B but 
lacks site A [149]. The rate of inactivation of this 
mutant is only 2.4 times slower in 10 mM CaCN. 
50 mM NaCl than in 10 mM EDTA, 50 mM NaCl. 

Another prediction of mechanism 2 is that any 
mutations which stabilize only in the presence of 
calcium will increase the binding constant for calci- 
um to one or both of the calcium sites. This can be 
either through effects on the binding sites themselves, 
as proposed for mutations A116E, G131D, P172D, 
S63D, N76D, S78D and K256Y and the thermitase 
loop 45-63 in BPN', or through indirect effects on 
conformational stability as seen for mutations V8I, 
S53T, L126I, G166S, G169A and T254A (Table 2). 
The indirect effect on calcium binding arises because 
apo-subtilisin displays a loss of cooperativity in the 
unfolding reaction [19]. Thus many mutations which 
stabilize in the presence of calcium do not stabilize 
in the presence of EDTA, because they do not influ- 
ence the rate determining step in the unfolding of 
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Fig, 3. The rates of thermal inactivation of subtilisin BPN' at 
65**C are plotted as a function of calcium concentration. The 
data are fit to mechanism 3 in the text. Data taken from Fig. I 
of PantoHano et al. [1 12]. 



apo-subtilisin. In fact, most mutations identified by 
random mutagenesis stabilize only in the presence of 
calcium. These mutants increase calcium binding af- 
finity because they preferentially stabilize NCa rela- 
tive to N. The premise that the effects of this class of 
mutations indirectly increase calcium affinity by in- 
creasing general stability was tested by introducing 
G166S, G169A and T254A into the rehabilitated S88 
version of A75-83 subtilisin [126]. Because the un- 
folding of the S88 subtilisin is cooperative in 
EDTA, these mutations now stabilize subtilisin S88 
in 50 mM Tris-HCl, pH 8.0, 50 mM NaCl, 10 mM 
EDTA to approximately the same extent that they 
stabilize subtilisin BPN' in 50 mM Tris-HCl, pH 8.0, 
50 mM NaCl, 10 mM CaCh. 

Finally mutations which stabilize in excess calcium 
and in EDTA to the same extent must stabilize N 
and NCa to equal extents. This would result in no 
change in calcium affinity. Mutations of this class are 
N218S, Y217K, Q206Cox and Q271E [2,111]. 

2.23. Disulfide mutants 

Because of the slow rate of the subtilisin folding 
reaction, most stability experiments are affected only 
by the activation energy for unfolding and not the 
equilibrium constant for unfolding. This immediately 
explains why engineering disulfide bonds into subti- 
lisin was so spectacularly unsuccessful in increasing 
resistance to thermal inactivation [95,108]. A well- 



designed disulfide cross-link should stabilize a pi^J- 
tein by decreasing the entropic cost of folding, 
loss of conformational entropy in a polymer due to a 
cross-link has been estimated by calculating the 
probability that the ends of a polymer will simulta. 
neously occur in the same volume element (vs) ac. 
cording to the equation: 

AS = -R \n{?>/{2Kl-Nf^')v, 

where is the number of segments and / is the 
length of a segment [118]. Good agreement with ex- 
perimental data for protein cross-linking has been 
achieved using 7=3,8 A and Vs = 58 A^, judged to 
be the closest approach of two -SH groups [106], 

Of 18 different disulfide cross-links which have 
been engineered into subtilisin, three have increased 
stability [108,110,160], Two of these stabilize only in 
the presence of EDTA. This is not surprising in ret- 
rospect because effects on the stability of the un- 
folded state would not generally be manifested in 
the activation energy of the unfolding reaction. 
This is because the transition state for the unfolding 
reaction appears to be compact, with a slightly larger 
heat capacity than the native state. Further analysis 
of one of the disulfide mutants (C22-C87) in the 
background of A75-83 BPN' showed that disulfide 
did in fact have the predicted effects on the unfolded 
state [150]. The increase in the energy of the unfolded 
state due to cross-linking 57 amino acids (22-87 mi- 
nus the nine amino acid deletion) would be 4.2 kcal/ 
mol at 25°C so the predicted maximum increase in 
folding rate at 25°C would be approx. 1000-fold 
Since the 22-87 disulfide accelerated folding by 
850-fold at 25°C in 0.1 M KPO4, pH 7.2, the accel- 
eration of the folding rate is qualitatively consistent 
with the simple statistical mechanical model and sug- 
gests that amino acids 22 and 87 are ordered in the 
transition state for folding. Accordingly, the small 
influence of the disulfide on the transition state for 
unfolding wild type BPN' in EDTA (Table 2) indi- 
cates residues 22 and 87 are only slightly less ordered 
in the transition state for unfolding in EDTA than m 
the folded state. Other mutations which preferen- 
tially decrease the entropy of the unfolded state rel- 
ative to the folded state, such as substituting for gl)'' 
cine or substituting with proline, also are nd 
necessarily expected to influence the rate of unfoW* 
ing. 
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Two engineered disulfide bond mutants have re- 
sulted in significant decreases in the rate of unfold- 
ing. One is a disulfide between residues 61 and 98 in 
subtilisin E, which was modeled after a naturally 
occurring disulfide in aqualysin I from Thermus 
aquaticus [160]. The other is a disulfide identified 
by random mutagenesis of A75-83 subtilisin, which 
cross-links residues 3 and 206 [149]. The 61-98 cross- 
link in subtilisin E slows thermal inactivation by 2.3- 
fold. The 3-206 cross-link in A75-83 subtilisin slows 
inactivation by 17-fold. The 3-206 disulfide links the 
N-terminal strand of subtilisin with the 202-219 
P-hairpin. Evidently disruption of the interactions 
between these two structural elements is involved in 
the transition state for unfolding A75-83 subtilisin. 
The 3-206 cross-link increases the folding rate of 
A75-83 subtilisin by only 1.8-fold [126]. Evidently 
ordering of these residues occurs after the transition 
state for the folding reaction. 

2.2.4. Random mutagenesis 

Random mutagenesis and screening proved to be 
an effective method to dramatically increase stability 
even without much understanding of the energetics 
of the subtilisin folding reaction. There are two ma- 
jor reasons for this. First, stabilizing mutations are 
fairly common. Although subtilisins are naturally ro- 
bust, on the order of 1% of the random amino acid 
changes measurably increase the half-time of thermal 
inactivation [124]. Second, contributions from indi- 
vidual stabilizing mutations generally accrue cumu- 
latively. Thus large increases in stability can be 
achieved with no radical changes in the tertiary pro- 
tein structure but rather minor, independent altera- 
tions. 

Random mutations have been introduced in vari- 
ous ways, including chemical mutagens, mutagenic 
base analogs, error prone PCR and spiked synthetic 
oligonucleotides. The key element in the process is 
the ability to screen large numbers of mutants for 
increased stability. Phenotypic screening has been 
carried out using plate or microtiter dish assays 
which allows assaying proteases from approx. 100- 
1000 mutant clones per plate or dish. To screen for 
stable mutants, secreted subtilisins are incubated at 
elevated temperature long enough to largely inacti- 
vate the wild type enzyme. When an assay for hydro- 
lytic activity is subsequently performed, only mutants 



with stability greater than wild type will exhibit mea- 
surable activity. Once stable mutants are identified, 
the corresponding colony can be grown up to iden- 
tify the mutation. The labor factor in screening limits 
the number of mutants which can be examined to the 
10-*-10^ range. All single amino acid substitutions in 
subtilisin would yield a total of 5500 different varia- 
tions. Since all combinations of double substitutions 
would produce 3x10^ variations, only the popula- 
tion of single mutations in subtilisin has been ad- 
equately searched for stabilizing events. In fact, 
even the population of single substitutions has not 
been completely explored because the nature of the 
genetic code dictates that each amino acid can be 
changed to an average of six other amino acids by 
a single base substitution in the gene. Thus only 
about 30% of the possible single substitution mutants 
would be produced from single base substitutions. 

Early studies with chemical mutagens found eight 
stabilizing mutations in BPN' by screening at most 
1200 different single amino acid substitutions 
[26,27,124]. Misincorporation induced by a-thio- 
deoxynucleotides identified three additional stabiliz- 
ing mutations in BPN' [35] and studies using error- 
prone PCR to introduce mutations in subtilisin E 
identified 11 stabilizing mutations [191]. Five of the 
mutations in subtihsin E were previously identified as 
stabilizing in BPN'. The fact that several of the same 
mutations have been independently selected indicates 
that many of the stabilizing mutations which can be 
produced with single base substitutions have been 
identified. Since this represents only 30% of the total 
possible single amino acid substitutions, many other 
stabilizing single substitutions must exist. Two exam- 
ples are the directed mutations Y217K and Q206C 
which both stabilize significantly but are not acces- 
sible by a single point mutation [1 1 1]. Further Miya- 
zaki and Arnold have shown that targeting random 
mutagenesis to positions at which stabilizing changes 
were already found can identify even better amino 
acids at these positions [96]. 

Once stabilizing single amino acids changes have 
been identified, building a highly stable subtilisin 
can be accomplished in a step by step manner by 
combining individual mutations into the same mole- 
cule. A combination of six stabilizing changes in 
BPN' decreased the rate of thermal inactivation by 
> 300-fold [111]. A similar result was achieved in 
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sublilisin E by performing multiple rounds of ran- 
dom mutagenesis screening and molecular breeding 
screening [191]. A hyperstable calcium-free subtilisin 
has also been constructed by a combination of design 
and random mutagenesis. This mutant inactivates 
250000 times more slowly than wild type BPN' in 
10 mM EDTA [126,149]. 



3, Future prospects 

3.1. Design vs. screening 

What strategies will prove most effective for engi- 
neering other properties of subtilisin? At the moment 
directed evolution seems to have become more fash- 
ionable than structure-based design as a method to 
'engineer' subtilisin. Part of this trend may be a re- 
sult of earlier disappointments with the ability to 
predict the phenotype of designed mutants, but 
most is a result of advances in random mutagenesis 
methods [76,135,190,192]. For example, synthesis of 
oligonucleotides using preformed trinucleotide phos- 
phoramidites will circumvent some of the limitations 
inherent to the genetic code [81]. Furthermore new 
methods of DNA shuffling allow efficient creation of 
chimeric proteases to try and combine desirable 
properties from parent enzymes [103,137,173]. Di- 
rected evolution and molecular breeding methods 
have proven useful for finding mutations which are 
better than wild type for several different properties 
[136]. There is always the danger, however, that the 
good will become the enemy of the best [125]. The 
new techniques do not circumvent the combinatorial 
problems inherent to purely random methods. Thus 
random approaches will be good for improving a 
global property such as stability which can be ac- 
crued incrementally but will not be successful when 
significant improvements depend on synergistic mu- 
tational events. Relying on the accumulation of sin- 
gle mutants insures that only solutions very close to 
the starting structure will be found. The best solu- 
tions may lie unmined a few layers deeper in muta- 
tional space. 

Optimizing subtilisin activity for a specific protein 
sequence or for a new substrate are cases in which 
synergistic mutations probably will be required. Con- 



sider the basic organization of the substrate bindino 
pockets of subtilisin. Although the deep Si and 
S4 binding clefts are the primary determinants 6f| 
substrate specificity, subtilisin is relatively non-spei^ 
cific in its cleavage preferences for protein s\x\^^^ 
strates. The broad specificity is in part a consequence'^' 
of the fact that the substrate peptide backbone inl^! 
serts itself between residues 100-104 and 125-129 to'^^ 
become the central strand in an antiparallel P-sheet.^^ 
This is different from the more specific chymotrypL f^'^ 
sin family of proteases in which a structural equiv^$^ 
alent of residues 100-104 is absent [113]. The best"|^ 
solutions to accommodate new substrates may in^V; 
volve altering main chain interactions and this will^ 
involve multiple synergistic mutations. When high>l 
resolution structural information becomes available^;: 
for the subtilisin class of prohormone converting eri-'f 
zymes, it will be interesting to see what structurali'; 
differences account for sequence-specific processinjgi* 
activity. ^^>^^ 
Introducing the bias of intelligent design into rm-'^j 
dom mutagenesis experiments has been criticized be-'^j 
cause of limitations in the intelligence of designers; s'; 
The dilemma is as follows. The more target positions* 
for mutagenesis are restricted, the greater the ability 
of screening to identify synergistic mutations. But the T 
greater restriction of the target positions, the greater?t; 
the danger of flawed design. In many cases, however,; 
only minimal design is required to identify produc-V 
tive regions of sequence space. Past experiences with 
directed mutagenesis have shown that mutations 
which have the greatest influence on substrate specif-' 
icity involve either direct contacts with the substrate s 
or electrostatic changes in the vicinity of the active 
site. This is also borne out by experiences with ran- 
dom mutagenesis and screening. For example, You, 
Chen and Arnold have randomly mutated subtilisin 
E using error-prone PCR and screened for increased 
activity in dimethyl form amide against a defined pep- • 
tide substrate [33,189]. Twelve mutations were iden-" 
tified in the screen. Of the twelve, two are involved in 
direct binding with the peptide, three are mutations 
of Asp or Glu to neutral amino acids at positions 
which would influence the pA'^ of H64, five are mu- 
tations which increase general stability and only two 
are at positions whose connections with activity m 
DMF are difficult to rationalize. 
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' 3.2. Phage display selection 



Recent successes in displaying sublilisin on the sur- 
face of phagemid particles greatly expands the pos- 
sibilities for selecting new properties [3,37,84]. While 
less direct than culture dish or microtiter plate meth- 
ods for screening, phage display methods increase the 
number of mutants which can be screened by at least 
four orders of magnitude. The ability to display H- 
braries of 1 X 10^ independent mutants allows screen- 
ing all combinations of amino acids at six specified 
positions. The obvious limitation of phage displav is 
that selection is achieved by binding activity, so that 
selection of a catalytic event is not trivial. In one case 
random mutations at 25 positions were introduced 
into S221C subtilisin to select for improved peptide 
ligation. Ligase activity allowed product capture by 
the ligation of the subtilisin phagemids with im- 
proved ligase activity to a biotin-tagged peptide [3]. 
A second study successfully displayed fully active 
subtilisin on phage, although this involved addition 
of the subtilisin inhibitor CI2 to the culture medium. 
Selection for a change in P4 specificity then was car- 
ried out using a biotin-linked peptide diphenylester 
inhibitor [84]. 

33. Uncoupling prodomain processing from selection 

A major limitation to any screening/selection 
method is that mutations affecting catalytic activity 
potentially affect the biosynthesis of subtihsin which 
is linked to autoprocessing of the prodomain [51]. 
Hence the selection of mutants will be biased toward 
enzymes which efficiently autoprocess. If the desired 
phenotype is activity toward a particular amino acid 
sequence, then the autoprocessing mechanism ac- 
tually might be used to aid in selection by mutating 
the processing site on the prodomain to the target 
sequence [5,84]. This is apparently what occurred in 
the natural evolution of prohormone converting en- 
zymes since the C-terminal sequence of the prodo- 
main reflects the processing specificity [143]. If the 
desired phenotype is activity against a novel sub- 
strate, however, one needs to uncouple the biosyn- 
thesis of subtilisin from the selection for the new 
activity. This has been accomplished by using the 
A75-83 version of subtilisin, which is capable of fold- 
ing without the prodomain [2.3,19,37]. 



3.4. Full circle 



The first genetically engineered subtilisin appeared 
in the literature in 1985 and addressed the sensitivity 
of subtilisin to oxidation by peroxide [47]. It had 
been determined earlier that M222 is sensitive to ox- 
idation leading to inactivation of the enzyme [146]. 
While it was clear that substituting for M222 would 
prevent this mechanism of inactivation by peroxide, 
it was not clear what amino acid would best substi- 
tute for methionine in providing optimal substrate 
interactions and preserving activity. For this reason, 
all 19 substitutions were made and the catalytic and 
stability properties of each compared. Thus even the 
first example of genetic-based protein engineering in 
subtilisin was in fact a random mutagenesis experi- 
ment which could be targeted to just one position 
because of detailed biochemical and structural infor- 
mation. After 15 years the best approach to 'engi- 
neering' desired properties into subtilisin probably 
remains targeted random mutagenesis, in which tar- 
get selection is informed by all available information. 
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A view across sand dunes in the Sahara. 
A study of wind-driven sand transport 
in the north-western Algerian Sahara 
identifies a previously unrecognized 
mechanism, page 532. (Photo: Frank 
Lane.) / 
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itently found forms of evidence 
■ scenes of crimes the con- 
quences for forensic science 
re considerable, page 543. 



|jg!nebula cycle 

pnty years after they were 
idlcted, a new class of cosmo- 
o^ical Xrray source is discov- 
red. Ring nebula NGC6888 is 
c first, pages 518 and 486. 

eslstance evasion 

bacterial pathogen of the 
^pper plant that has mutated 
6 ev^de host recognition has a 
transpbsable element in a gene 
responsible for the plant's hyper- 
sensitive response, page 541 . 
I' 

,'Greenhouse' gas rising 

Levels of atmospheric methane, 
'a candidate for contributing to 
global warming, are increasing. 
Radiocarbon data suggest that 
over 30 per cent of atmospheric 
lethane is derived from fossil 
carbon, pages 522 and 489. 

Developmental switch 

The switch from mitosis to 
meiosis in yeast has been pinned 
down to the inhibition of a 
protein kinase by a product of a 
jgene specifically activated in 
diploid cells, page 509. 



Lochs more boniiie 

Have reductions in sulphur 
emissions and acid rain deposi* 
tion in the past decades led to 
improvements in the environ- 
ment? Chemical and diatom 
analyses of a pair of Scottish 
lochs give sonie of the answers,- 
page 530. 

Brain power 

Electron microscopy shows the 
brain protein MAP IC, thought 
td be responsible for the trans- 




port of cytoplasmic organelles, 
to be structurally similar to 
dynein, the force-generating 
protein in cilia and flagella. Sec 
page 561 . 

Titanic collisions 

Earthly laboratory experiments 
provide evidence to support the 
idea that the nitrogen gas pres- 
ent on Saturn's moon Titan 
formed from ammonia as a 
result of high-velocity collisions 
with meteors, page 520. 

Great Lakes battle 

Despite an 'invasion' from the 
north by a voracious predator, 
the factors limiting the algal 
biomass in Lake Michigan seem 
to be related to nutrient supply, 
not a prey/predator balance, 
pages 537 and 491 . 

Guide to Authors 

Facing page 568. 
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Fig. 2 Inhibition of *"l-labcHed pooled human IgG binding to 
high affinity Fc receptors (FcRl) on U937 cells by monomcric 
mouse lgG2b immunoglobulins, (O), Wild type lgG2b; (•), 
Glu235-^Lcu mutant IgG2b. For methods see Fig. 3 legend. 



o 




Fig. 3 Scatchard plot of *"l-labellcd mutant Glu235 Leu mouse 
IgG2b binding to high affinity receptors (FcRI) on U 937 cells, r. 
Number of moles of *«I.(Glu 235-^ Uu) mouse IgG2b antibody 
bound per mole of cells. A. Concentration of free 1-mutant 
lgG2b^ The number of receptors per cell is lower than those 
previously reported'* **, but a Scatchard analysis of l-labelled 

. pooled human IgG binding to the U 937 cells was similar (not 
shown); The diminished values for receptor number may be caused 
by growing U 937 lo high cell concentrations (0.9 x 10 per ml>t- 
Metbods. The IgG-FcRI binding assay was essentially as previously 

' ' described^ except that after introduction of water-immiscible oil 
to the equilibrium mixture followed by rapid centrifugation. the 
pelleted cells (bound »"l-lgG) and medium (free '"I-IgG) were 
separated by slicing through the tube within the oil layer. 

(cleaved between 233 and 234)'« resulted in a loss of binding 
to human FcRI*'-^**. although in these two cases the two CH2 
domains of the antibody are no longer tethered together by the 
hinge disulphides. In the alignment of ref. 12, antibodies with 
substitutions at residues 231 and 233 still bind tightly to FcRI, 
but those with changes at residue 234 have a reduced affinity. 
Furthermore residues 236-238 are completely conserved, except 
in mouse IgGl and human IgG2. which do not bind to human 
FcRI. Much of the link, in particular residues 234-238. may 
therefore be required for binding to human FcRI. 

The hinge link is mobile in the crystallographic structure of 
human Fc^' and is accessible to proteolytic attack. Thus papain 
cleaves between residues 233 and 234 in mouse IgG2a and 
lgG2b**i pepsin between residues 234 and 235 in human IgGl 
and residues 238 and 239 in mouse IgGl^; thermolysin between 
residues 234 and 235 iii human IgGl". The facile proteolysi? 



of several IgG isotypes in this region may simply reOect the 
underlying design of the FcRI binding site. The site appears to 
be accessible and Bexible and would^ permit, for example, a 
hinge dislocation on binding to FcRP*. ^ 

In conclusion, our results suggest that the hmge hnk. either 
as a single flexible strand or paired with the strand from the 
other heavy chain, is a major determinant in bindmg of antibody 
to FcRI. and we would predict that changing Uu 235 for 
glutamic acid (and perhaps other side chains) would destroy 
the interaction of human IgGl or lgG3 with FcRI. The possibil- 
ity of turning on and off the interaction of antibody with human 
FcRI could help dissect the role of this receptor in phagocytosis 
and Jell mediated lysis and in antibody therapy. Purtbermorc 
in imaging of solid tumours, eliminating interactions with FcRI 
could help reduce background due to antibody binding lo cells 
with high affinity receptors in the lymphatics, liver and spleen. 
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Dissecting the catalytic triad of 
a serine protease 

Paul Carter & James A. Wells 

Department of Biomolccular Chemistry, Gcncntcch Inc., 
460 Point San Bruno Boulevard. South San Francisco, 
CaUfornia 94080» USA 




' ^ I 

Serine proteases are present In virtually all organisms and function 
both Inside and outside the ceil'; they exist as two famlllw, tM 
♦trypsin-like' and the *subtllisln-like\ that have IndependenUi 
evolved a similar catalytic device^ characterized by the Ser, HisJ 
Asp triad, an oxyanion binding site, and possibly other deterj 
minants that stablliie the transition state (Fig. l)'^. For BflcOW 
amj^loliqu€faciens subtillsin, these functional elements impart J 
total rate enhancement of at least 10» to 10'<» times the nofl 
enzymatic hydrolysis off amide bonds. We have examined tM 
catalytic Importance and interplay between residues within thi 
catalytic triad by Individual or multiple replacement wltf 
alanlneCs), using sltenlirected mutagenesis*- of the clonrf 
B. an^Uqaefaciens subtllUIn gene'. Alanine subsUtutions weH 
chosen to minimize unfavourable steric contacts and to avol 
imposing; new charge Interactions or liydrogen boqds frog 



Kinetic parameters of mutant subtilisihs with tKc substrate /V-succihyl-L-Ala-L-AIa-L-Pro-L-Phe-p-nitroanilidc at pH 8.60 
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Active site configuration 
Ser221 . His64 Asp32 



i?221 A , 
:H64A 
'; D32A 
24C:b32A:H64A 
S24C:H64A:S221A 
S24C:D32A:S221A 
'S24C:D32A:H64A: 



+ 
+ 

+ 
+ 
+ 



+ 
+ 
+ 



+ 



S221A 



(4.4^0:0x10* 
(5.9±0.2)x 10' 
(3.4±0.1)x.l0,-*. 
(3.8 ± 0.2) X 10-' 
(2.3±0.2)xl0'^ 
(2.6±0.l)xl0-'* 
(2.8±0.2)xld-' 
(2.8±0.1)xl0-' 
(3.0±0.1)x 10"' 



180±10 
220 ±20 
420 ±40 
390 ±50 
480 ±80 
270 ±50 
290±40 
310±40 
230 ±20 



k^jK^ (s-*M-») 

(2.5±0.1)xl0* 
(2,7 ± 0.2) XI O' 
(8.2±0.6)xl0'' 
(9.6±1.0)X10-' 

4.7 ±0.7 
(9.4±1.6)xl0-V 
(9.6±1.3)xl0-* 
(9.2±0.9)xl0~^ 
(1.3±0.1)xi0-* . 




Jcc^(routant) 
^t(S24C) 

0.74 ±0.01 
1 



. 1 ^i/rUt 

(5.8±o.i>;xib':'3<ift; 

(6.4±0.2)xlO~T:: . 
(3.8 ±0.2) x 10-'.: ' 
(4.3 ± 0.1) X 10-* 
(4.8 ±0.2) X 10-' 
(4.8 ±0.1) X 10-' 
(5.1 ± 0.1) X 10"' 




No enzyme 



none 



(l.l±0.1)xlO 



(Ter 



1-8 



Jtca,(S24C) 

(1.9±0.1)xl0"*** 




< Mutants arc abbreviated by the single-letter code for the wild-type amino acid followed by its codon position and the amino acid replacement; 
multiple mutants arc designated by listing single mutant components separated by^colons (for example, double mutant Ser24 to Cys. Ser221 to Ala 
is designated S24C:S221A), Construction of the mutants S24C and H64A and the double mutant S24C: H64A was as described The mutations 
P32A and S24C were constructed simultaneously using a 48-mer oligonucleotide''^ and the S221 A mutant was constructed by cassette mutagenesis". 
tThe remaining multiple mutants were constructed by 3-way ligations using a 6 kb BcoRl/BamH} fragment from the vector pSS5 (B. Cunningham, 
;P. Powers, and J. W. unpublished) and two subtilisin fragments from appropriate mutants. Mutant constructions were verified by dideoxy 
^equencing^*^ Mutant plasmids were expressed in a protease deficient strain of B, subtitis, BG2036^'. Rescue of active site mutants by co-culturing 
with the mutant A48E and purification was as described'^. Mutant subtilisins were assayed with the substrate, A/-succinyI-L-AIa-L-Ala-L-Pro-L-Phe-p- 
nitroahilide (Sigma). Six hydrolysis assays were performed simultaneously against substrate blanks in 1 ml 100 mM Tris-HCI (pH 8.60) 4% (v/v) 
dimethylsulphoxide at (25±0.2)*'C using a Kontron Uvikon 860 spectrophotometer. Initial reaction rales were determined from the increase in 
absorbance at 410 nm on release of p-nitroaniline (£410 = 8, 480 M"' cm-*r'. The total substrate concentration in each assay was determined^ from 
A410 after complete hydrolysis. The initial rate data were fitted to the Michaelis-Menten relationship using least squares analysis to deterxnlhe'- 
and V^„. Turnover number {k^i) was calculated from the spectrophotometrically determined en2yme concentration iefJo* — 1.17)". Enzyme 
jconcentrations in the assays were 30-110 jjig ml"* for the active site mutants and 1 M^gml"* for the wild type and S24C enzymes. Catalytic triad 
sidues are represented by ( + ) and Ala replacements by {-), Data arc presented ± standard errors and the. spontaneous hydrolysis rate of 
bstrate under these conditions is shown as ^buffer' 



^bjStitated side chains. In contrast to the effect of mutations In 
iesldues Involved In substrate binding^"*, the mutations in the 
catalytic triad greatly reduce the turnover number and cause only 
minor effects on the Michaells constant. Kinetic analyses of the 
multiple mutants demonstrate that the residues within the triad 
ntepct synerglstically to accelerate amide bond hydrolysis by a 
Seior of -2x10*, 

Subtilisin Is synthesized as a membranc'^associated precursor 
pfeprOsubtilisin)^ When expressed in a protease-deficient 
strain of B. subtilis^ mature amyhliquefaciens subtilisin is 
eiVicnentlv released into the medium after autoproteolytic 
^leavage". Mutagenesis of the catalytic residues in subtilisin 
which essentially inactivates the protease) disrupts this process- 
ig, but processing can be restored by co-culturing the mutants 
ith a smalt amount of a B. subtilis strain (called a 'helper*) 
arbouring an active subtilisin gene*^. We have constructed a 
eries of active site mutants in which the catalytic triad residues 
replaced by alanine in every possible combination (ref. 12, 
le^l). Each mutant also contains a surface-accessible Ser24 
^Gys mutation designated S24C (mutant enzymes are named 
sing the single letter code for amino acids to indicate the 
iibstittitioris made, see Table 1). The S24C substitution permits 
yersible attachment to an activated thiol sepharose column 
ereby eliminating traces of contaminating helper subtilisin 
hich is cysteine-free". 

;The hydrolysis of the substrate (^-succinyl-L-Ala-L-Ala-L- 
p-L-Phe-p-nitroanilide) by most of the active site mutants 
rodiided only small absorbance changes (AA410 of 0.01 to 0.10) 
ver long periods (up to 12 h), yet the data exhibit typical 
ichaelis-Menten saturation behaviour (Fig. 2) with standard 
errors almost as small as those for wild-type subtilisin (Table 
). No detectable loss of catalytic activity occurred even during 
he longest kinetic runs. In addition, the bacliground (non- 
nzymattc) hydrolysis of substrate was ^25% of the catalysed 
te for even the least active enzymes ( Fig. 2). The non-enzymatic 



hydrolysis was subtracted directly from the enzyme assays using 
blank substrate solutions in a double beam spectrophotometer. 

Kinetic analysis of the active site single mutants (Table 1) 
shows that replacement of the catalytic serine, histidine. or 
aspartate causes a drop in turnover number (/Cc«i) by factors of 
2 x10*, 2x10* and 3 x lO^ respectively. The 100-fold; lower 
values of k^ai which result from substitution of Ser221 and His64^ 
compared with Asp32, are consistent with their more central 
role in catalysis (Fig. 1). Each mutation causes a small increase 
in the Michaelis constant (Kj^,) ('-2-rold) which may result from 
slightly altered substrate binding contacts. (Wild-type subtilisin 
has a two-step enzyme mechanism where deacytation is >33 
times faster then acylation*"*, so that is a good approximation 
of the enzyme-substrate dissociation constant (K,)^'. As the 
enzyme mechanism must be changed for at least some of the 
mutants (see below), may be less than K^.) 

Additional mutagenesis of the S24C : S221 A enzyme to replace 
either Asp32, His64 or both, causes essentially no further change 
in fcca, or (Table 1). By comparison, further mutagenesis of 
the S24C : D32A parent enzyme to substitute His64 or both His64 
and Ser221, further reduces Accai 9 and 76-fold, respectively, 
with essentially no change in ^Cp,. These data suggest that His64 
provides a catalytic advantage of -^lO-fold to the S24C: D32A 
enzyme, and that Ser221 provides -'lO-fold advantage to the 
S24C:D32A:H64A enzyme. As with the S24C:S221A family 
of mutants, additional mutations in the S24C : H64A enzyme to 
replace Ser221 or. both Ser221 and Asp32 do not affect J^^. But 
replacement of Asp32 alone in the S24C: H64A mutant to give 
S24C: D32A: H64A, actually increases k^^ 7-fold. Thus, Asp32 
is a liability to the S24C : H64A enzyme, possibly because of an 
unfavourable electrostatic effect upon catalysis (see below). 

The single and multiple mutant analyses show that the 
catalytic effects are non-additive in two ways. First, there is a 
gross discrepancy between the relative drop in kc^i resulting 
from the triple alanine mutant (2 x 10*, Table 1) compared with 
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T«b!e 2 Kinetic parameters of mutant subtilisins with the substrate N-succinyl-L.AIa.L-Ala-L-Pro-L-Phe-p-nitroanilide at pH 9.70 



Enzyme 

Wild type 
S24C 

S24C:S221A 

S24C:H64A 

S24C:D32A 

S24C: D32A:H64A 

S24C:H64A:S221A 

S24C:D32A:S221A 

S24C : D32 A : H64 A : S22 1 A 



Active site configuration 
Ser221 His64 Asp32 



+ 
+ 

+ 

+ 



+ 
+ 



+ 

+ 
+ 



(6.3 ±0 
(8.1 ±0 
(5.4±0. 
(I.9±0. 
(1.8±0. 
(1.8±0. 
(5.2 ±0 
(5.9 ±0 
(7.8 ±0. 



(5-») 

1) xlO* 

2) xI0* 

3) xlO"* 
l)xlO-* 
l)xlO-^ 

1) xlO"^ 

2) X 10"* 

3) xlO"* 
3)xlO-* 



440 ±30 
560 ±30 
650 ±90 
1300±150 
1400±120 
460 ±40 
480±60 
460 ±80 
730 ±70 



fc,,yx„(8-'M-M 

(1.4±0.1)xI0* 
(1.5±0.1)xlO* 
(8.4±1.0)x 10"^ 
(1.5±0.2)xlO-* 
(1.3±0.1)xlO' 

3.8 ±0.3 
(|.l±O.OxlO"* 
(|.3±0.2)xlO-» 
(I.l±0.1)xl0"' 



No enzyme 



none 



(2.8 ±0.1) x 10-* 



fc^,.(/>H9.7) 

ik„,(pH8.6) 

1.4±0.1 
1.4±0.1 
1.6 ±0.1 
5.1 ±0.2 

7.8 ±0.4 

6.9 ±0-3 
1.9 ±0.1 
2.1 ±0.1 • 
2.6 ±0.1 

k^n.f(j>^ 9-7) 
k^^„„{p^ 8.6) 
2.5 ±0,1 



Kinetic data were determined as for Table 1 except that 100 mM 3-[cyclohexylamino]-2.hydroxyl.l. propane buffer (pH 9.70) was used. Ionic 
strength was normalized with NaCl. 



the product of the relative effects from ihe three single alanine 
miitants'(^10*'). Second, the double alanine mutants that retain 
singly the catalytic Ser, His or Asp are only a factor of 8, 0.9 
or 0.9 larger in k^^, respectively, than the triple alanine mutant. 
The product of these values (-6) is much below the relative 
/<^t value of 2x10* for wild type (S24C) compared with the 
triple alanine mutant. Thus, non-additive effects are shown either 
by subtraction of catalytic residues relative to wild-type enzyme 
or by addition of single catalytic residues relative to the triple 

alanine mutant. 

Replacement of residues in the catalytic triad with alanines 
necessarily perturbs the enzyme mechanism. In particular, it has 
been observed that in the absence of the catalytic His64 in 
■ subtiiiSiri*^ or the catalytic Asp 102 in trypsin'*-", there iis a 
marked increase in the hydroxide dependence of catalysis 
between pH 8 and 10 compared io the wild-type enzymes. Com- 
parisons of the kinetic parameters for all of the catalytic triad 
mutants at pH 9.70 and pH 8.60 (Table 2) show that those 
retaining Scr221 have a substantially stronger pH dependence 
of /tct (increased 5- to 8-fold) than enzymes containing an intact 
, catalytic triad (increased 1.4-fold), or enzymes lacking Ser221 
: -''lincrcased 1.6- to 2.6-fold), or when compared with the noh^ 
jhi^ enzymatici rate (increased 2.5-foId). For all enzymes the X„ 
values at pH9.70 arc increased between 1.5 and 3.3-fold. Pre- 
liminary evidence suggests that this effect upon K„ may result 
i f ; (at 'least partially) from ionization of Tyrl04, resulting in elec- 
trostatic; repulsion of the P5 succinyl group (see Fig. 3, atrd D. 



Estcll, T. Graycar, D. Powers and J. A. Wells, unpublish^, 
results). ' • -^.-nffi,^] 

For mutants that retain Ser221. the simplest interpretation o ^ 
the data is that they continue to use Scr221 as the catalytic 
nucleophile. The presence of Ser221 provides a catalytic advan- 
tage of -10-fold to the S24C : P32A: H64A enzyme and -100^ 
fold to the S24C : D32A cnzynie. Furthermore, replacing HisS 
in the S24C : D32 A enzyme causes fce.t to drop - lO-fold, suggesU 
ing that His64 functions here to some extent (presumably as a 
proton acceptor for the nucleophilic Ser221). In addition, l»; 
deprotonation of the Ser221 hydroxy! is a prerequisite for 
nucleophilic attack in these mutants, then it is reasonable for 
fc^t to depend on hydroxide ion concentration, as observr 
: (Table 2). Finally, in the absence of His64, the catalytic asparg 
should inhibit deprotonation of Sef221 and have a deletenOu 
electrostatic effect upon fcct. as indeed was found (fc^; for 
S24C:H64A is 10-fold lower than the feet . for 
S24C:D32A:H64A in Table 1). Like wild-type subtihsm, ye 
anticipate the S221A family of enzymes should have a two-step 
enzyme mechanism. For th^e mutants, if deacylation »s "Jte 
determining, it is possible that the K„ values are substantf- 
less than the X. values**. , • 

For the S24C: S221 A family of enzymes, the reaction canno 
proceed by the usual serine acyl-enzyme intermediate. Instedd 
direct attack of water on the scissile peptide bond may occu 
to produce a single tetrahcdral intermediate that collai)S^l 
give the hydrolysed products. Wucleophilic attack by water i 




Fig. 1 Schematic diagram showing 
the rate limiting acylation step in the 
hydrolysis of peptide bonds by sub- 
tilisin. In going from the Michaelis 
enzyme-substrate complex (E-S) to 
the transition state complex (E*S*), 
the proton on Ser221 (darkly shaded) 
is transferred to His 64, thus permitting 
nucleophilic attack on the scissile pep- 
tide bond*"*. The proton is then trans- 
ferred to the amine leaving group to 
generate the acyl-enzyme intermediate 
(E-Ac). Asp32 (as for Aspl02 in 
. trypsin^"*'"'") is believed to position 
: the correct 'tautomer of His64 for 
catalysis in the E-S complex and 
,}t stabilize the protonated form of.His64 
. - in .the . is - S^ complex. Some of the 
^^hydroisch bonds that form in the E • S^ 
' complex are shown by dotted lines. In 
HcS^l^iirfn*' these' steps arc; reversed 
'^Van'd water (as the nucleophile) replaces 
.f^|.i\'-;w 't .^jj^- amine leaving group. 
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insistent with the weak hydroxide dependence of k^^tor the . 

i221At^containing mutants. The lack of.a deleterious elec^ lilrffw'L* fV* 
|Lti:^^i^effect from Asp32 is also consistent with.-aj-ineutralsj, . ; ^ , 
ting ' hucleophile (compare S24G: H64A:S2iiA* with ' 
f24C:D32A:H64A:S221A in Table 1). It is unlikely that the^ 
liiJA' group of enzymes use the other members. of the catalytic^. 
]&al>ecause there is no additional kinetic advantage fo^ includ- ^ ; ; ' 
iV*ttie His64 or Asp32. (Strictly, we cannot be sure that the 
(idual' members of the triad are catalytically inert. We simply 
inptjdetect any catalytic advantage for them over the residual 
jvity resulting from dieterminants unrelated to the triad— see 
»elpw). Preliminary X-ray analysis of the S221A enzyme indi- 
^tes no large structural change except for the Ser221 to Ala^ 
[substitution (R. Bott and M. Ultsch, personal communication): 
[More kinetic and structural data will be necessary however, to 
^substantiate the possible mechanisms discussed above. 

The small values of /c^ai for the active site mutants raise 
'questions regarding protease contaminants or assay artefacts. 
rThe following evidence argues strongly against these 
^■ possibilities. (1) Unlike wild-type subtilisin, the mutant enzymes 
are not inhibited by phenylmethylsulphonyl fluoride. (2) 
1^ Although changes in the values are small for these mutants, 
'many are statistically different from wild type (Tables 1,2). A 
'contamination with helper subtilisin (regardless of. amount) 
^vould give a constant value for the equal to wild type. (3) 
Many of the active site mutants differ significantly from each 
'other in and K^^ at pH 8.6 (T^ble 1), which is inconsistent 
^with a constant contaminant. (4) The mutants differ among 
themselves and wild type in terms of their pH dependence of 
(Table 2), a result inconsistent with a fixed protease con- 
£taminant. (5) Although the kinetic values reported in Tables I 
id -2 are from the same batch of enzyme, most mutant enzymes 
^have been purified more than once. In every repeat case (data 
\not shown) the kinetic values agree within the standard error 
piniits shown (s±l5% for fcc and #C^). even though enzyme 



d^lds varied, and purification protocols were sometimes slightly 
i<^lfiedi (6) The mutants were expressed in an extracellular 
»|x>teai5e deficient strain of B. subtilis^ purified on activated thiol 
lepharose, and judged to be >99% pure by silver-stained SDS- 
^PAGE. Moreover, further purification of the S24C:H64A 
^enzyme by native gel electrophoresis gave identical kinetic 
values as the starting material . 

It is formally possible that the residual activity in some or all 
^{hese. mutants, occurs at a non-specific site(5) distinct, from 
le .active site. The following points argue for catalysis at the 
Lc^ve site. (1) In some cases the kinetic effects are cumulative 
for mutagenesis at the active site. For example, the /c^t values 
decrease in the following order: S24C> S24C: D32A> 
f S24C : D32 A : H64A > S24C : D32 A : H64A : S22 1 A (Table I ). (2) 
The Kf^^ values are usually not more than twofold above the 
twild type value suggesting continued strong and specific binding 
r(assuming K^-^ K,), Furthermore, the active site mutants show 
similar pH dependent increase in as wild type subtilisin. 
[3) The substrate preferences for the S24C:D32A and 
[^S24C:S221A enzymes toward two other substrates essentially 
the wild type enzyme (P. C, unpublished results). The 
luBstra'te specificity of the S24C:H64A enzyme also parallels 
he wild type except for a strong preference for His P2 sub- 
^strates*^ (see below). (4) The aaivity of the S24C: H64A enzyme 
iia heat denaturable (C. Mitchinson, unpublished results) which 
(indicates that the native protein conformation is critical for 
fcatalysts. (5) The residual activity for even the least active mutant 
[ts still >10^ fold above the non-enzymatic rate. This catalytic 
[rate is in the range measured for *good* catalytic antibodies 
'aken together these data provide compelling evidence that the 
residual catalytic activities we have measured are not due to 
protease contamination, assay artefacts or non-specific catalysis 
iaway from the normal active site. 

We suggest that the residual activity in the triple mutant is 
[derived from remaining binding determinants which stabilize 
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Fig. -2 Initial rate of hydrolysis vq {^A^iq/M) versus the con- 
centration of the substrate A^-succinyl-L-Ala-L-Ala-L-Pro-i^Phe-p^ 
nitroanilide [S] in the absence (•) or presence (O) of 
S24C : D32A : H64A : S221 A subtilisin. The background hydrolysis 
rate (•) was subtracted directly from the rate in the presence of 
subtilisin to give the enzymatic rate (O). Experiments were pcrfor*. 
med in 100 mM Tris - HCl, pH 8.60, at 25 ±0.2 *C, as described in 
Table 1. Insert (■) shows an Eadie-Hofstee plot of the initial rate 

data. 



the transition state complex outside the catalytic triad. 'Iti fact, ' 
previous data show that when the hydrogen bond to Asnl55 in . 
the oxyanion binding site (Fig, 1) is disrupted by site-directed v 
mutagenesis, there is a 10^ to 10^ drop in fcc with little effect 1 
upon Km***^*, Additional hydrophobic interactions (Figi-3) >yith • 
the PI substrate side chain** and binding interactions with the 
P2 to P4 substrate residues^^*^^ are estimated to^contribute 
independently factors of 10 to 100 to Structural analysis^^ 
suggests there are additional hydrogen bonds \r\ the transition 
state complex between the NH of Ser221 and the oxyanion, and . 
between the NH of the PI substrate residue and the carbonyl 
of Serl25. Deriving the total catalytic contribution from the sum 
of these individual binding components may lead to overestitna- 
tion because of their possible interdependence. Nonetheless, 
our data indicate that some or all of these determinants arc 
important for stabilizing the tetrahedral transition state complex 
(contributing >10^ to k^i), and are not simply required for 
positioning the substrate for optimal nucleophilic attack by 
Scr221. 

From an evolutionary point of view, it is extremely unlikely 
that the catalytic triad arose in one step rather than involving 
active intermediates. This view is now apparently complicated 
by the fact that the residues in the catalytic triad function in an 
extremely synergistic manner. But, assuming that the present- 
day enzyme is a reasonable model of its ancestor,, there are. at 
least two possible mutagenic pathways that give progressive 
increases in catalytic rate by stepwise introduction of the 
residues in the triad. In the first pathway, installing Ser221 
followed by His64 and then Asp32 gives progressive increases 
of 8. 9 and 3x10^ in /ccai (Table 1). This progression is even 
more uniform under alkaline conditions, resulting in increases 
in of 50. 10 and 5x10^ (Table 2). A second mutagenic 
pathway is possible by preferential use of a His P2 substrate 
(Fig. 3)^° in place of the catalytic His64. We have previously 
shown that the Ala64 enzyme has a turnover number of 2x 
for hydrolysis of a His P2 substrate compared to 8x 
for an Ala P2 substrate*^. This catalytic advantage, 
which we have called 'substrate-assisted catalysis*, makes it 
feasible to reverse the order of introducing His64 and Asp32. 
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Fig. 3 • Siereovicw of a model conteining the 
substrate. N-succinyl-u-AIa-L-Ala-L-Pro-L- 
Phc-/>-nitroaniHde (bold lines and filled atoms), 
bound to the active site of B. amytoliquefaciens 
subttlisin. Alpha carbons from important 
enzyme and substrate residues are labelled. In 
protease substrate nomenclature the substrate 
may be represented as 

O H 

NH,-Pn"*Pl-C-N-Pr Pn'-COOH, 
where the scissile peptide bond is between the 
PI and PV residues^ . The E • S model is based 
^ . - upon a prelirainary 2.0 A X-ray structure of a 
product bound to subtilisin and the succinyl 
and p-nitroanilidc groups were introduced by 
modelling (R- Bott and M. Ultsch. unpublished 
data). This model is similar to a previously 
published complex . 

Of course this advantage would apply only to His P2 substrates 
but would be reasonable if the ancestral enzyme were involved 
in specific proteolytic processing, for example. Regardless of 
the exact order of evolutionary events, our mutagenic studies 
show that Inserting catalytic triad residues in a stepwise fashion 
can produce enzyme intermediates with progressively increased 

tumovcr.numbers. . . . _, ^ 

• In summary, when residues in the catalytic tnad are altered 
separately or together there arc large effects on turnover rate, 
consequent changes in the enzyme mechanism, and only minor 
effecU dh, the Michaelis constant. The residues m the catalytic 
triad function in a strongly synergistic fashion and contribute 
a factor of about 2 x 10* to the total to the catalytic rate enhance- 
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ment of 10' to 10*°. The residual activity from complete replace-^ 
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but results from transition state stabilization from contacts otit^ 
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able in terms of both evolution and function. 
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Human rhinoviruses, like other picornaviruses, en- 
code a cysteine protease (designated 3C) which cleaves 
mainly at viral Gln-Gly pairs. There are significant 
areas of homology between picornavirus 3C cysteine 
proteases and cellular serine proteases (e.ff. trypsin), 
suggesting a functional relationship between their cat- 
alytic regions. To test this functional relationship, we 
made single substitutions in human rhinovirus type 14 
protease 3C at seven amino acid positions which are 
highly conserved in the 3C proteases of animal picor- 
naviruses. Substitutions at either His-40, Asp-85, or 
Cys-146, equivalent to the trypsin catalytic triad His- 
57. Asp-102, and Ser-195, respectively, completely 
abolished 3C proteolytic activity. Single substitutions 
were also made at either Thr-141, Gly-158, His-160, 
or GIy-162, which are equivalent to the trypsin speci- 
ficity pocket region. Only the mutant with a conserv- 
ative Thr-14 1 to Ser substitution exhibited proteolytic 
activity, which was much reduced compared with the 
parent. These results, together with immunoprecipi- 
tation data which indicate that Asp-85, Thr-141, and 
Cys-146 lie in accessible surface regions, suggest that 
the catalytic mechanism of picornavirus 3C cysteine 
proteases is closely related to that of cellular trypsin- 
like serine proteases. 



Human rhinoviruses (HRVs),^ the main causative agents 
of the common cold, form one genus of the Picornavirus 
family (Stott and Killington, 1972; Gwaltney. 1975). The 
primary translation product of the positive stranded RNA 
genome of picornaviruses (e.g. HRVs, poliovirus, and foot- 
and-mouth disease virus) is a single precursor polypeptide 
which is rapidly processed by viral proteases to mature prod- 
ucts (Nicklin et ai, 1986; Krausslich and Wimmer, 1988). 
Proteolytic cleavage of the viral precursor protein plays an 
important part in the regulation of picornavirus replication. 
Two Tyr-Gly pairs in the precursor are cleaved by viral 
protease 2A (Krausslich and Wiraraer, 1988). Most of the 
cleavages are performed by viral protease 3C (30*"*) which 

* The costs of publication of this article were defrayed in part by 
the payment of page charges. This article must therefore be hereby 
marked **aduertisernent'' in accordance with 18 U.S.C. Section 1734 
solely to indicate this fact. 

§ To whom all correspondence should be sent. 

X Present address: Dept. of Microbiology and Immunology, The 
University of Adelaide, Box 498, GPO, Adelaide, South Australia 
5001, Australia. 

' The abbreviations used are: HRVs, human rhinoviruses; HRV- 
14, human rhinovirus type 14; 3C^™, viral protease 3C; SDS-PAGE, 
sodium dodecyl sulfate-polyacrylamide gel electrophoresis; KLH, key- 
hole limpet hemocyanin; Ap**, ampicillin resistant; PBS, phosphate- 
buffered saline. 



exhibits a preference for Gln-Gly pairs (Nicklin et al, 1986; 
Krausslich and Wimmer, 1988). 

SC**'" from poliovirus (Hanecak et al, 1984; Ivanoff et al., 
1986; Richards et aL, 1987; Nicklin et al., 1988), encephalo- 
myocarditis virus (Parks et aL, 1989). foot-and-mouth disease 
virus (Klump et o/., 1984; Strebel et ai, 1986) and HRV-14 
(Cheah et ai, 1988; Libby et al, 1988) have been cloned and 
expressed in Escherichia coli. In most of these studies, the 
3(^pn> precursor form has been shown to cleave its flanking 
Gln-Gly sites to release mature 30^"* in an autocatalytic 
fashion. However, cleavage at Gln-Gly to release the polio- 
virus capsid proteins is performed not by 30^"* but by the 30- 
3D precursor in which SC^™ is covalently fused to the adjacent 
3D polymerase (Jore et al., 1988; Ypma-Wong et al., 1988). 

3(7~ activity is inhibited by cysteine protease inhibitors, 
indicating that cysteine may be an active-site amino acid 
(Korant, 1973; Pelham, 1978; Korant et at,, 1985). In fact, 
sequence comparisons of 3C proteases from animal picorna- 
viruses and 3C-like proteases from some plant viruses showed 
that only one of the cysteines (Cys-147 in poliovirus) is highly 
conserved in all these viruses (Argos et ai, 1984; Franssen et 
al, 1984). Strong evidence that Cys-147 of poliovirus is an 
active-site amino acid came from site-directed mutagenesis 
studies which demonstrated that mutation of the highly con- 
served Cys-147 to Ser resulted in the inactivation of the 
protease, whereas similar mutation of the nonconserved Cys- 
153 had no effect (Ivanoff et at., 1986). 

It was suggested on the basis of computer alignments that 
the viral 3C cysteine proteases may represent an evolutionary 
link between the cellular cysteine proteases exemplified by 
papain, and the cellular trypsin-like serine proteases (Gor- 
balenya et ai, 1986). More extensive computer alignment of 
picornavirus 3C proteases and cellular serine proteases re- 
vealed some remarkable primary and secondary structural 
homologies, indicating that certain amino acids within 3Cp"*, 
including Cys-147 (Cys-146 in HRV-14), may be responsible 
for catalysis or substrate binding in a mechanistically similar 
fashion to the cellular serine proteases (Bazan and Fletterick, 
1988). His-40, Asp-85, and Cys-146 of HRV-14 30^"*. which 
are completely conserved in all picornaviruses align with His- 
57, Asp-102, and Ser-195 of the trypsin-like serine protease 
catalytic triad (Bazan and Fletterick, 1988). As a result of 
these alignments, Thr-141, Gly-158, and His-160 of HRV-14 
3Qpro ^hich are also completely conserved in all picornavi- 
ruses, and Gly-162 which is conserved in HRVs and entero- 
viruses [e.g, poliovirus), align with the amino acids lying in 
or close to the specificity pocket of the cellular serine pro- 
teases (Bazan and Fletterick, 1988). In this paper, we describe 
introduction of single amino acid substitutions in HRV-14 
3CP" at the positions which correspond to the trypsin catalytic 
triad and specificity pocket. All except one of the substitutions 
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destroyed the proteolytic activity of SC"*. In addition, mon- 
ospecific peptide antisera raised against some of the regions 
in 30^"* corresponding to the trypsin catalytic triad and spec- 
ificity pocket, efficiently immunoprecipitated 30"°. Our re- 
sults suggest that the picoma viral 3C cysteine proteases and 
cellular serine proteases may catalyze peptide bond cleavage 
utilizing basically similar mechanisms. 

MATERIALS AND METHODS 

Oligonucleotides and Peptides — Oligonucleotides 1 to 3 (Table I) 
and the sequencing primer 5' GCGTGTTGACTGGATTT 3' (HRV- 
14 nucleotides 5823-5839; Stan way et a/., 1984) were synthesized 
using a Pharmacia Gene Assembler. Oligonucleotides 4 to 9 (Table I) 
were purchased from Promega. Peptide 1 {CGGGTLDRNEKFRDIR, 
Fig. 1) and peptide 2 (RYDYATKTGQC, Fig. 1) were purchased from 
Diagnostic Biotechnology (Singapore) and Cambridge Research Bio- 
chemicals (United Kingdom), respectively. 

Preparation and Characterization of Peptide Antisera — A non- 
natural cysteine and three glycine spacers were added to the amino 
terminus of the core peptide 1 sequence (TLDRNEKFRDIR) to 
facilitate coupling of the peptide to the carrier protein keyhole limpet 
hemocyanin (KLH) (Sigma). No additional amino acids were intro- 
duced into peptide 2 (RYDYATKTGQC) which already has a cysteine 
at the carboxyl end. 2.5 mg each synthetic peptide was coupled to 
KLH via cysteine using yV-maleimidobenzyl-TV-hydroxysuccinimide 
ester (Pierce Chemical Co.) (Nivison and Hanson, 1987). 

To induce antl-peptide antibodies, two rabbits were subcutaneously 
inoculated with 100 fig of each of the KLH -coupled peptides mixed 
with an equal volume of Freund's complete adjuvant. Subsequent 
injections were carried out with the same amount of coupled peptides 
emulsified in Freund's incomplete adjuvant at monthly intervals. 
Sera were prepared from blood collected 2 weeks after each booster 
and kept at -70 *C. 

For dot blot analysis, serially diluted peptides and KLH were 
spotted onto nitrocellulose membranes (0.45 ^M, Sartorius) and dried. 
The membranes were incubated with 5% skim milk in phosphate- 
buffered saline containing 0.05% Tween 20 (PBS-T) at 22 'C for 2 h. 
The blocked membranes were then incubated with the test sera 
diluted in PBS-T at 22 'C for 16 h. The membranes were washed 
three times with PBS-T and incubated with biotinylated goat anti- 
rabbit IgG (Bethesda Research Laboratories) at 22 *C for 1 h, then 
washed again three times. The membranes were treated with Strep- 
tavidin-horseradish peroxidase conjugate (Bethesda Research Labo- 
ratories) at 22 'C for 1 h, washed as before, and incubated with 0.33% 
4-chloro-naphthol in methanol and 0.018% hydrogen peroxide in 
PBS. 

Maxicetl Labeling and Protein Analysis — Polypeptides expressed 
by plasmids in E. coli maxicell strain CSR603 (Sancar et ai, 1979) 
were labeled with [^S] methionine (>1200 Ci/mmol, Amersham 
Corp.) according to Cheah et at, (1988), except that the cell pellet was 
resuspended in lysis buffer containing 50 mM Tris-HCl, pH 7.5, 30 
mM NaCl. and 200 /xg/ml lysozyme. Cell lysis was achieved by three 
rapid freeze-thaw cycles. The lysed cells were centrifuged for 20 min 



at 4 *C and the supernatant (soluble fraction) was saved. The pellet 
(insoluble fraction) was resuspended in lysis buffer. 5 ;il of the soluble 
and resuspended insoluble fractions were mixed with an equal volume 
of loading buffer (25 mM Tris-HCl, pH 6.8. 3% SDS, 7.5% ^-mercap- 
toethanol, 25% glycerol, and 0.05% bromophenol blue), boiled for 10 
min, subjected to SDS-PAGE, and autoradiographed (Cheah et oL, 
1988). 

Immunoprecipitation — 25 mI of antiserum, diluted in 300 ti\ of 
immunoprecipitation buffer (50 raM Tris-HCl, pH 7.4, 150 mM NaCl, 
and 2% Triton X-100), were preabsorbed with KLH and unlabeled 
E. coli maxicell extract at 22 "C for 2 h. 20 /il of |*^S]methionine- 
labeled E. coli maxicell extract was then added to the preabsorbed 
antiserum and mixed at 4 *C for 17 h. 100 fil of protein A-Sepharose 
CL-4B (Pharmacia LKB Biotechnology Inc.) was added, mixed for a 
further I h, and centrifuged. The pellet was washed three times with 
immunoprecipitation buffer and 10 mM Tris-HCl, pH 7.5, resus- 
pended in 50 /il of loading buffer, boiled for 10 min, and analyzed by 
SDS-PAGE. 

For the analysis of gel -purified polypeptides, l^S]methionine-la- 
beled polypeptides were separated by SDS-PAGE (Cheah et at, 1988). 
The gel was rinsed with NT buffer (25 mM Tris-HCl, pH 7.4, and 25 
mM NaCl), immediately dried, and autoradiographed. The areas of 
the gel corresponding to the 30^"* precursor and the 20-kDa SC**"* 
were cut out and soaked in NT buffer at 4 'C for 17 h. The superna- 
tant, containing diffused proteins, was immunoprecipitated as de- 
scribed above and analyzed by SDS-PAGE. 

Site-directed Mutagenesis and DNA Sequencing — The mutagenesis 
protocol was essentially as described by Kunket et at (1987) using 
the Muta-gene^ M13 m uitro mutagenesis kit (Bio-Rad). First a M13 
recombinant was constructed, consisting of the entire plasmid 
pKCCllO (Cheah et ai, 1988) subcloned in the Pstl site of bacterio- 
phage Mis mpl9 to give pLCl77. To prevent deletion of the insert, 
a plaque picked directly from the transformation was grown for 6 h 
in 6 ml 2 X TY medium, and the single-stranded DNA purified as 
follows: 5 ml culture supernatant from a 10-min centrifugation was 
mixed with 0.65 ml of 20% polyethylene glycol 6000 and 2.5 M NaCl. 
After 15 min at 22 *C, the phage was collected by centrifugation (10 
min) and the pellet dissolved in 250 /il of 20 mM Tris-HCl, pH 8.0, 1 
mM EDTA. DNA was isolated by two phenol extractions and one 
chloroform extraction, then precipitated with ethanol. 

The template DNA for mutagenesis, uracil-enriched pLCl77 sin- 
gle-stranded DNA, was obtained by retransforming the recombinant 
single-stranded phage DNA (pLCl77) into the Dut" Ung~ E. coli 
strain CJ236 (Kunkel et ai, 1987), and purifying the single -stranded 
DNA as above. 

The annealing of the mismatching oligonucleotides (Table I) to 
the template DNA and polymerization with T4 DNA polymerase in 
the presence of T4 gene 32 protein were performed essentially ac- 
cording to the manufacturer's instructions (Bio-Rad Muta-gene® kit), 
except that the polymerization reaction was incubated at 25 'C for 
18 h following the recommended incubations at 4, 25, and 37 "C. The 
resultant closed, circular DNA was transformed into the Ung*^ E. coli 
strain MV1190 and four independent plaques from each mutagenesis 
mixture were screened for the correct mutation by dideoxy sequencing 



Table I 

Mutations generated by site-directed mutagenesis 





Sequence of mutagenic oligonucleotide 5'— »3' 


Location of oligo- 
nucleotides on 
HRV-14 cDNA- 


Amino acid substitution* 


Predicted role of 
amino acid' 


1. 


CACCTCCAGACTGCCCAG 


5663-5680 


Cys-146-^Ser (pAC304) 


Catalysis 


2. 


CACAGCACACCTCCCATCTGCCCAGTTTTTG 


5657-5687 


Cys-146-^Met (pAC305) 


Catalysis 


3. 


CACAGCACACCTCCAGTCTGCCCAGTTTTTG 


6667-5687 


Cy3-146-^Thr CpAC306) 


Catalysis 


4. 


GCTGTGCGTCTGTGGGTATC 


5343-5362 


His-40->Asp (pAC307) 


Catalysis 


5. 


CCCTGATAGCTCTGAATTTTTC 


6476-6497 


A8p-85-^Ala <pAC308) 


Catalysis 


6. 


CCCAGTTTTTGATGCATAATCATAAC 


5642-5667 


Thr-141-^Ser (pAC309) 


Base of specificity 










pocket 


7. 


CAACATGAATATCAAAGATCTTAC 


5696-5719 


Gly-158-»Asp (pAC3l0) 


Highly conserved 


8. CGCCAACATTAATACCAAAGATC 


5700-5722 


Hi8-l60-^A8n (pAC311) 


Side of specificity 










pocket 


9. 


CTTCCATTACCGTCAACATGAATAC 


5708-5732 


Gly-162-^Asp (pAC312) 


Top of specificity 










pocket 



" Nucleotide number shown is based on the published HRV-14 sequence (Stanway et ai, 1984). 

Plasmid names are shown in parenthesis (see text for details). 
' According to the alignment with trypsin (Bazan and Fletterick, 1988). 
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(Sanger e< ai, 1977) using the primer 5' GCGTGTTGACTGGATTT 
3'. 

To regenerate plasmids equivalent to the parental plasmid 
pKCCllO, the mutant derivatives of pLCl77 were digested with Pstl 
(Amershara Corp.), and the linear DNA was allowed to self-Ugate. 
The DNA was transformed into E. coli strain MC1022 and ampicillin- 
resistant (Ap*^) transformants were selected (Maniatis et ai, 1982). 
Finally, the mutant plasraid DNAs were retransformed in E. coli 
CSR603 maxicelLs for analysis of plasmid-encoded proteins (see 
above). 

RESULTS 

Immunoprecipitation of 30"^ and Its —SS-kDa Precursor — 
The predicted HRV-14 3C"* amino acid sequence (Stanway 
et aif 1984) was analyzed for short peptide regions with a 
good potential for inducing antibodies that would recognize 
surface epitopes in SC" (Garnier et ai, 1978; Lerner, 1984). 
The analysis predicted that amino acids 76 to 87 and 136 to 
146 (peptides 1 and 2, respectively, Fig, 1) lie in hydrophilic 
turn regions in the protein, which is in agreement with Werner 
et al (1986). These peptides were therefore chosen for raising 
antisera. Two rabbits were independently immunized with 
each peptide coupled with KLH. Sera from each pair of rabbits 
reacted with the homologous peptide in a dot blot assay, and 
no cross-reactivity was detected with the heterologous pep- 
tides. Preimmune sera from all four rabbits gave no reaction 
with either peptide (not shown). 

We have previously reported the construction of a HRV-14 
expression plasraid pKCCllO which codes for 30**™ plus some 
flanking viral sequences. In E. coli maxicells, pKCCllO en- 
codes a unique precursor polypeptide of '-SS-kDa, which was 
suggested on the basis of its size to comprise the carboxyl- 
terminal portion of the viral RNA-linked protein VPg (3B), 
the entire 30^™ and the amino-terminal half of the viral 
polymerase 3D (^) (Fig. 1; Cheah et al,, 1988). The -55-kDa 
3Qpro precursor is rapidly processed to several polypeptides, 
including 30*"^ migrating at --20 kDa (Cheah et al, 1988). 

Fig. 2A shows that in extracts of [^SJmethionine-labeled 
E, coli maxicells harboring pKCCllO, 30^" and the -56- kDa 
30"° precursor are more abundant in the insoluble pellet than 



in the lysozyme (soluble) extract (Fig. 2A, compares lanes 2 
and 5). A background protein comigrating with the — 55-kDa 
band is occasionally detected in the soluble fraction of maxi- 
cells carrying the vector pKCClOO (Fig. 2A, lane 4). 

Immunoprecipitation experiments using the soluble frac- 
tion (lysozyme supernatant; Fig. 2A, lane 3] demonstrated 
that peptide 1 and 2 antisera specifically recognize the 20- 
kDa SC^"* polypeptide (Fig. 2B, lanes 2 and 5), whereas the 
preimmune sera did not (Fig. 2B, lanes 3 and 6). The —55- 
kDa 30**™ precursor from the soluble fraction of E. coli was 
not inununoprecipitated by either peptide antisera (Fig. 2B, 
lanes 2 and 5). 

To circumvent the lack of immunoprecipitation of the —55- 
kDa 3C*"** precursor protein, the [^S]methionine-labeled pro- 
teins encoded by pKCCllO in E. coli maxicells were separated 
by SDS-PAGE, and the gel was immediately dried and auto- 
radiographed without fixing the proteins. The regions corre- 
sponding to the -55-kDa 30**"* precursor and SC"' (Fig, 2/1. 
lane 1) were excised from the dried gel and eluted by diffusion 
at 4 "C. The eluted proteins were either rerun on a second 
SDS-polyacrylamide gel (Fig. 2C, lanes 1 and 6) or incubated 
with peptide 1 and 2 antisera and inmiunoprecipitated. Both 
peptide antisera immunoprecipitated the —55- kDa 30**™ pre- 
cursor (Fig. 2C, Uxnes 2 and 3) and 3C**'** (Fig. 2C, lanes 7 and 
5), whereas preimmune sera did not (Fig. 2C, lanes 4, 5, 9, 
and W), Further, the immunoprecipitation of the gel-purified 
SC^"* precursor by both peptide antisera was inhibited by prior 
absorption of the peptide antisera with 10 >tg of the homolo- 
gous peptide (not shown). 

Taken together, the immunoprecipitation experiments con- 
firmed our previous assignment of the —55- and — 20-kDa 
polypeptides as SC™ precursor and 3C**"*, respectively (Cheah 
et a/., 1988) and clearly indicate that amino acids 76 to 87 and 
136 to 146 are surface epitopes of SC"™ (Fig. 1). 

Construction of 30^ Mutants — Computer alignments of 
animal picomavirus 3C proteases and cellular serine proteases 
have indicated a limited number of significant homologies. 
The presumed active-site Cys-147 of poliovinis 3C^", equiv- 
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Fig. 1. Schematic diagram showing the HRV-14 portion of recombinant plasmid pKCCllO. The 

heavy blackened line represents the cDNA of HRV-14 cloned in the trp promoter expression vector pKCClOO, and 
the hatched box depicts the 19 amino acids derived from vector sequences fused in frame to the HRV-14 open 
reading frame (Cheah et ai, 1988). The proposed Gln/Gly cleavage sites flanking SC*"" are shown as Q/C{2) and 
Q(182)/G (Stanway et ai, 1984; Cheah et ai, 1988). Peptide sequences chosen for raising antibodies, shown as open 
boxes, are Pi (peptide 1, amino acids 76 to 87 with an amino-terminal extension of Cys-Gly-Gly-Gly) and P2 
(peptide 2, amino acids 136 to 146). The full sequences of the peptides are given under "Materials and Methods.** 
The locations of the amino acids substituted by site-directed mutagenesis are shown in sin gle letter code (see text 
and Table I for details). The viral proteins and their precursors {3B, 3C, 3D, 3C-3D, and 3B-3C-3D) are shown 
with the estimated sizes in parentheses (Stanway et ai, 1984; Cheah et at^ 1988). Truncated proteins are indicated 
by overlining {e.g. WD). 
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FiC. 2. Protein analysis. A, autoradiograph of a 12.6% SDS-polyacrylamide gel showing (**SJmethionine- 
labeled HRV-14 polypeptides synthesized in E, coii CSR603 maxxcells. Lane J, pKCCllO (whole lysate); tone 2, 
pKCCllO <solubilized pellet fraction); lane 3, pKCCllO (soluble fraction extracted with lysozyme); Ume 4, vector 
pKCClOO without insert (soluble fraction extracted with lysozyme). Unique polypeptides encoded by recombinant 
plasmid pKCCllO are indicated on the left (Fig. 1; Cheah et oL, 1988). Bla is /^-lactamase. B. immunoprecipitation 
of protease 3C by peptide antisera. [**SlMethionine.labe!ed soluble proteins encoded by pKCCUO were either 
loaded directly on the SDS-polyacrylamide gel {lanes 1 and 4), immunoprecipitated with peptide 1 antiserum {lane 
2), or immunoprecipitated with peptide 2 antiserum (lane 5). Lanes 3 and 6 are identical to lanes 2 and 5, 
respectively, except thai preimmune sera were used. The arrowheads on the right of panels A and B indicate the 
positions of size standards from top to bottom of sizes 68, 43. 25.7. and 18.4 kDa. C. immunoprecipitation of SDS- 
polyacrylamide gel-purified SC**™ precursor {left panel) and 3C**^ {right panel). The regions in the gel (Fig. 2A. lane 
1) corresponding to the 3C^ precursor and 30*"* were excised, and the proteins were eluted and analyzed on a 
12.5% SDS-polyacrylamide gel. Lanes 7 and 6. proteins loaded directly, lanes 2 and 7, immunoprecipitation with 
peptide 1 antiserum; lanes 3 and S. immunoprecipitation with peptide 2 antiserum; lanes 4, 5, 9, and /O, 
immunoprecipitation with preimmune sera. 
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FiC. 3. Proposed alignment of catalytic and speciHcity pocket amino acids of trypsin and.HRy-14 

30"". Computer alignment of the catalytic triad (A) and specificity pocket (A) amino acids of trypsin with the 
corresponding residues of HRV-14 30**^ is shown (Bazan and Fletterick. 1988). Amino acids in HRV-14 30**" 
substituted by site-directed muUgenesis (Fig. 1, Table 1) are shown in bold type. Based on our results, T-141 and 
not A- 140 of 3C»~ may be equivalent to D-189 of trypsin (see "Discussion"). Identical ammo acids are boxed. 



alent to Cys-146 in HRV-U 30""*, is highly conserved in all 
animal picornaviruses and lies in an area of significant ho- 
mology with the active-site Ser-195 of trypsin-like serine 
proteases (Gorbalenya et al., 1986; Bazan and Fletterick, 
1988). In addition, His-40 and Asp-85 of HRV-14 (Stanway 
et ol, 1984) are highly conserved in animal picornaviruses 
and cellular serine proteases. His-40, Asp-85, and Cys-146 of 
HRV-14 can be superimposed on the trypsin serine protease 
catalytic triad, His-57. Asp-102, and Ser-195 (Fig. 3; Kraut, 
1977; Craik et aL, 1987; Sprang et at, 1987). Therefore, 
substitutions were made individually at His-40 and Asp-85, 
and three different substitutions were made at Cy8-146 to test 
whether these amino acids are essential for the catalytic 
function of SCP" (Table 1, Fig. 1). 



The computer alignments also revealed that HRV-14 3C?** 
amino acids Thr-141. His-160, and Gly-162 lie in positions 
equivalent to serine protease amino acids known to be impor- 
tant for substrate binding and specificity (Fig. 3; Kraut, 1977; 
Bazan and Fletterick, 1988). In trypsin, the equivalent amino 
acids are serine, valine, and tryptophan, respectively (Fig. 3). 
Thr-141 and His-160 are highly conserved in picornaviruses, 
while Gly-162 is only partially conserved. Two lines of evi- 
dence suggest that these 3 residues are among those which 
are important determinants of Gln-Gly cleavage specificity. 
First, molecular modeling of His-160/Gly-162 in the pocket 
of a trypsin -inhibitor complex structure revealed possible 
hydrogen -bonding interactions between viral Thr-141/His- 
160 and the enzyme-bound side chain of the Gin substrate 



7184 



Mutational Analysis of a Picomauirus 3C Protease 



(designated Si position) (Kraut, 1977; Bazan and Fletterick, 
1988), Second, Staphylococcus aureus (strain V8) protease, 
which is a serine protease with a specificity for Glu in the Si 
pocket, has a Thr-141/His-160/Gly-162 complement of resi- 
dues (Drapeau, 1978; Bazan and Fletterick, 1988). Thus, 
changes were made individually at Thr-141, His- 160, and Gly- 
162 (summarized in Table I) to test whether these residues 
are essential for cleavage at Gln-Gly. In addition, Gly-158 was 
chosen for mutagenesis as an example of a very highly con- 
served residue occurring in the vicinity of the predicted spec- 
ificity pocket (Fig. 3, Table I). 

Single amino acid substitutions in HRV-14 30""* were gen- 
erated via site-directed mutagenesis (Kunkel et aLf 1987) 
using synthetic oligonucleotide primers (Table I). The single- 
stranded DN A template was prepared by subcloning the entire 
Pstl-linearized plasmid pKCCllO into the Pstl site of MIS 
mpl9 to give pLCl77 (Fig. 4). Following site-directed muta- 
genesis and DNA sequencing, the M13 mpl9 segment of 
pLCl77 derivatives bearing mutations in 30^"* was deleted by 
Pstl digestion, followed by self- ligation for the reconstruction 
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Pst I Digestion of pLC177 Mutant Derivatives 



18«<f-»g»tlon R 
»«l«ctlon for Ap 




pAC series (Ap " ) 

Fig. 4. Scheme for site- directed mutagenesis. The recombi- 
nant plasmid coding for 3C^"* and flanking sequences ipKCCllO, 
blackened lines) and M13 mpl9 (double lines) were digested with Pstl 
and tigated together, yielding pLCl77. The open arrowheads denote 
the trp promoter and ribosome-binding site of pKCCllO (Cheah et 
ait 1988). Site-directed mutagenesis and sequencing of mutants are 
described in detail under "Materials and Methods." The mutant 
derivatives of pLCl77 were digested with Pstl, and the DNA was 
allowed to self-Ugate, generating the pAC series of mutant plasmids 
(Table I). The asterisk denotes a site-specific 3C*" mutant. 



of the Ap" gene and selection for Ap** transformants. This 
manipulation regenerated mutant plasmids equivalent to 
pKCCllO {pAC series. Fig. 4; Table I). 

Expression and Proteolytic Activity of Mutant 3C Pro- 
teases — The expression in E. coli of 3C proteases linked to 
the adjacent upstream and downstream viral flanking se- 
quences provides an immediate assay for the activity of the 
protease (Hanecak et a/., 1984; Klump et ai, 1984; Cheah et 
oL, 1988). The precursor form of HRV-14 30^" releases ma- 
ture SC^™ by autocatalytic proteolysis (Stanway et al, 1984; 
Cheah et ai, 1988; Figs. 1 and 2A ). It is most likely that HRV- 
14 3C^"* is released by proteolysis at its flanking Gln-Gly sites 
as found for poliovirus, since it has been shown that short 
synthetic peptides are efficiently cleaved at Gln-Gly by cloned 
HRV-14 3CP"* (Libby et aL, 1988). 

A comparison of the expression of parental and mutant 
HRV-14 3C**"* precursors in E, coli maxicells is presented in 
Fig. 5. In the case of the parental SO*"*, significant processing 
of the 3CP" (55 kDa) precursor to 3D (31 kDa) and 3C^"' (20 
kDa) was observed during the 1-h labeling period (Fig. 5, lanes 

2 and 6; Cheah et al, 1988), The doublet migrating at -46 
kDa probably consists of unrelated plasmid-encoded proteins 
since it is present in the vector control (Fig. 5. lane 1) and 
the yields are highly variable (see also Fig. 6). All nine mutant 
plasmids, each of which codes for a single amino acid substi- 
tution in SC™, expressed a precursor polypeptide of identical 
size (55 kDa), but migrating slightly slower than the SC^™ 
precursor encoded by the parent plasmid pKCCllO (Fig. 5, 
lanes 3-5 and 7-12). However, none of the mutant precursors, 
with the exception of the Thr-141 to Ser mutant, were cleaved 
to 3D and mature SC"*, demonstrating that their catalytic 
function had been destroyed. The fact that eight independent 
point mutations at six amino acid positions completely inhibit 
processing at two Gln-Gly sites makes it highly unlikely that 
E. coli proteases are involved in specific proteolysis of the 
parental 3C^™ precursor in the E. coli maxicell system. 

The Thr-141 to Ser mutation severely impairs processing, 
since very little 3D and 3C**"* were detected (Fig. 5, lane 12). 
The 3C*"^ (Ser- 141) precursor occurred as a doublet with 
bands of equal intensity, unlike the other mutants which only 
expressed the upper band (Fig. 5, compare lane 12 with Uines 
7-11), These observations provide an explanation for the 
parental 3CP'° precursor migrating slightly faster in SDS- 
polyacrylamide gels than the proteolytically inactive mutant 
3C**"* precursors (Fig. 5, e.g. compare lanes 2 and 6 with lanes 

3 and 7). With the parental SO'"' precursor (55 kDa), fast 
cleavage at the 3B/3C junction and slower cleavage at the 3C/ 
3D junction (Fig. 1) results in the accumulation of a 52,8-kDa 
3C-3D precursor (Fig. 5, lanes 2 and 6), In support of this 
explanation, a longer autoradiographic exposure of lanes 2 
and 6 of the gel shown in Fig. 5 revealed the presence of the 
authentic 55-kDa parental precursor comigrating with the 55- 
kDa precursor of the proteolytically inactive mutants (not 
shown). Therefore, the SC^"* precursor encoded by pKCCllO, 
previously designated "'-55 kDa," most probably consisted of 
the 62.8- kDa 3C-3D precursor and a small amount of 55-kDa 
3B-3C-3D (Fig. 1; Cheah et at., 1988). The longer exposure of 
the gel shown in Fig. 5 also did not reveal detectable 52.8- 
(3C-3D), 31- (3D) or 20-kDa (SC"*) bands with the proteolyt- 
ically inactive mutants, confirming that catalytic function of 
SC"* had been destroyed. 

Pulse-chase Analysis of Polypeptides Expressed by Mutant 
Plasmids — To examine whether the mutant 55-kDa precur- 
sors exhibit 3C*"** catalytic activity during prolonged incuba- 
tions, a series of pulse-chase experiments was performed. Fig. 
SA shows that following a 2-min [^SJmethionine pulse and a 



Mutational Analysis of a Picornavirus 3C Protease 

1 23456789 10 11 12 



7185 




Fig. 5. Polypeptides encoded by protease 3C mutant plasmids. The [^*S J methionine -labeled polypeptides 
in the whole extracts of £. co/i CSR603 harboring various recombinant plasmids (Table I) were separated by SDS- 
PAGE. Une I, the vector pKCClOO; lane 2. pKCCllO (parent); lane 3. pAC304 (Cys-146 to Ser); lane 4, pAC305 
(Cys-146 to Met); lane 5. pAC306 (Cys-146 to Thr); lane 6. pKCCllO (parent); lane 7, pAC307 (His-40 to Asp); 
lane 8, pAC3lO (Gly-158 to Asp); lane 9, pAC311 (His-160 to Asn); Ume 10, pAC3l2 (Cly-162 to Asp); lane )/. 
pAC308 (Asp-85 to Ala); tone 12, pAC309 (Thr- 141 to Ser). Arrows on the right show the positions of protein 
markers with sizes from top to bottom of 68, 43, 25.7, and 18.4 kDa, Indicated on the left are the pKCCllO-encoded 
viral polypeptides. 3B-3C-3D (55 kDa), ZC-W (5^8 kDa). 3D (31 kDa), and 3C (20 kDa) (Fig. 1). Bla is /?- 
lactamase. 



4-h chase with unlabeled methionine and chloramphenicol, 
nearly all the parental 30^"* precursor was processed to ^ 
and SC**"* (see also Fig. 5 of Cheah et al, 1988). In contrast, 
no processing of the 3B-3C-3D precursor to 3C-3D. and 
3C^"* was detected with the Asp-85 to Ala mutant, even during 
an 18-h chase period (Fig. 6B). An identical result was ob- 
tained with the His-40 to Asp, Cys-146 to Ser. Cys-146 to 
Met, Cys-146 to Thr, Gly-158 to Asp. His-160 to Asn, and 
Gly-162 to Asp mutants (not shown). With the Thr-141 to 
Ser mutant, the ^-3C-^/3C-?D doublet was processed dur- 
ing the chase period to 315 and a SC^" mutant polypeptide 
(Fig. 6C), albeit at a much slower rate than that of the parental 
3CP'** precursor (Fig. 6i4). These results strengthen our con- 
clusion that mutations at six amino acid positions totally 
inactivate SC", and mutation of Thr-141 to Ser severely 
impairs 3C proteolytic activity. 

DISCUSSION 

We have previously utilized the E, coli maxicell system to 
demonstrate expression and autocatalytic proteolysis of an 
HRV-14 3C**"* precursor (Cheah et at,, 1988). In the present 
study, the parental and mutant 30"° precursors were ex- 
pressed at comparable levels in E, coli maxicells, but the 
parental precursor migrated slightly faster in denaturing gels 
than the proteolytically inactive mutant precursors (Fig. 5). 
This is because cleavage of the parental 5B-3C-3D precursor 
is much faster at the 3B/3C junction than at the 3C/3D 
junction, resulting in the accumulation of a 3C-3D precursor 
of 62.8 kDa (Fig. 1). In other picornaviruses. cleavage at 3B/ 
3C has also been reported to be faster than cleavage at 3C/ 
3D (Strebel et al, 1986; Richards et al, 1987; Jore et al, 1988). 
In vivo, a slow cleavage at 3C/3D would control the release of 
mature 3C**™ and at the same time provide an adequate supply 
of 3C-3D, the active protease required for cleavage of the 
capsid protein precursors (Jore et al, 1988; Ypma-Wong et 
ai, 1988). 

The E. coli maxicell system has for the first time provided 
a sensitive, convenient, and rapid way of assaying the effects 
of single amino acid substitutions on the proteolytic activity 



of autocatalytic proteases. Seven amino acid positions in 
HRV-14 SC*"" were chosen for site-directed mutagenesis based 
on two considerations. First, amino acids at all seven positions 
are highly conserved in animal picornaviruses. Second, an 
alignment with trypsin predicted that certain 3C?*" residues 
may be involved either in catalysis or substrate binding and 
specificity (Fig. 3; Bazan and Fletterick, 1988). It has previ- 
ously been shown that the Cys-147 to Ser mutation inactivates 
poliovirus 3Cp"*, although it was not clear whether residual 
proteolytic activity remained (Ivanoff et cd., 1986). Here we 
show that if Cys-146 of HRV-14 3C^ (equivalent to poliovirus 
CJy8-147) was changed either to serine, methionine, or threo- 
nine, proteolytic activity was completely destroyed. Likewise, 
mutation of His-40 to Asp or Asp-85 to Ala, which are 
equivalent to His-57 and Asp- 102 in the catalytic triad of the 
trypsin-like serine proteases, completely destroyed 30"° ac- 
tivity. Two different antisera raised against peptides contain- 
ing 3CP"* amino acids 76 to 87 and 136 to 146 efficiently 
immunoprecip itated mature 3C**", strongly suggesting that 
Asp-85 and Cys-146 lie in accessible surface locations in 30*". 
Taken together, the site-directed mutagenesis and immimo- 
precipitation data suggest that catalysis by HRV-14 ZO"^ is 
performed by a surface triad of His-40, A8p-85, and Cys-146 
in a mechanistically similar fashion to the histidine, aspartic 
acid, and serine at the active-site of the trypsin-like serine 
proteases (Fig. 3; Kraut, 1977; Craik et at. 1987). 

A very recent independent alignment of viral cysteine and 
cellular serine proteases (Gorbalenya et al, 1989) is largely in 
agreement with the analysis of Bazan and Fletterick (1988), 
except that Glu-71 and not A8p-85 was suggested to represent 
the acidic amino acid in the catalytic triad of HRV-14 and 
most other picornavirus 3C proteases. Although a glutamic 
acid has never been found in the serine protease catalytic 
triad and some 3C proteases have Asp-71, the participation 
of position 71 in the catalytic triad of 3C cysteine proteases 
cannot be ruled out. 

Amino acids in viral 3C proteases predicted to be involved 
in determining Gln-Gly cleavage specificity include the HRV- 
14 residues Ala-UO, Thr-141. Gly-158. His-160. and Gly-162 
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Fig. 6. Kinetics of cleavage of parent and mutant protease 
3C precursors. Vira! polypeptides expressed in UV- irradiated E. 
coli maxicells were labeled with ['^'^Sj methionine for 2 mi n and chased 
for the times indicated at 37 'C in the presence of excess unlabeled 
methionine and chloramphenicol (Cheah et ai, 1988). Panel A, 
pKCCl 10 (parent);pone/ B, pAC308 (Aap-a5 to Ala); panel C, pAC309 
(Thr-141 10 Ser). Arrows show the positions of protein markers with 
sizes from top to bottom of 68, 43, 25.7, and 18.4 kDa- Indicated on 
the left are the viral po^Tjeptides, I5B-3C-3D (55 kDa), 3C-3I5 (52.8 
kDa), (31 kDa), and 3C (20 kDa) (Fig. 1). Bla is ^^-lactamase. 

(Fig. 3; Bazan and Fletterick, 1988; Gorbalenya et ai, 1989), 
Ala-140 in HRV-14 3C**"* aligns with Asp-189 of trypsin, an 
important determinant of Arg/Lys cleavage specificity located 
at the base of the substrate binding pocket (Graf et ai, 1987). 
However, Ala-140 is unlikely to be directly involved in 3C**~ 
specificity, since other picomaviruses have the functionally 
dissimilar residues Gin, Asn, Glu, or Pro in this position. We 



found that Gly-158 to Asp. His-160 to Asn. and GIy-162 to 
Asp substitutions abolished 30**^ activity, supporting the 
proposal that each of the amino acids in these positions plays 
a crucial role in cleavage specificity (Bazan and Fletterick, 
1988). Consistent with our results, the His- 161 of poliovirus 
3CP"' (equivalent to His-160 of HRV-14) was converted to a 
glycine and proteolytic activity was also lost (Ivanoff et aL, 

1986) . The Thr-141 to Ser mutation in HRV-14 30**'" mark- 
edly reduced its activity. Our immunoprecipitation data sug- 
gest that Thr-141 lies in an accessible surface region and. as 
discussed earlier, Thr-141 could form a hydrogen bond with 
the side chain of the Si -bound Gin substrate. In theory, Ser- 
141 could similarly form a hydrogen bond, but the interaction 
would be weaker, since serine has a shorter side chain than 
threonine. A weaker interaction might explain the impaired 
activity of the Ser-141 mutant. Based on these considerations, 
we speculate that Thr-141 and not Ala-140 of 30**'* is equiv- 
alent to the important Asp-189 of trypsin (Fig. 3; Graf et a/., 

1987) . 

It is remarkable that substitutions at six positions in SC"* 
completely destroyed proteolytic activity, and one additional 
substitution (Thr-141 to Ser) severely impaired activity. It 
could be argued that 30 proteases are highly sensitive to 
structural changes. Although we cannot exclude this possibil- 
ity, there are two considerations which argue against it. First, 
some substitutions in poliovirus 30^"* are without effect (Ivan- 
off et aL, 1986; Dewalt and Semler, 1987). Second, the 30 
proteases of two related HRV subtypes HRV-2 and HRV-14 
are less than 50% homologous, and structurally dissimilar 
amino acids align at many positions (Stan way et ctL, 1984; 
Skem et aL, 1985), 

We have demonstrated that seven amino acids which are 
highly conserved in the 30 proteases of animal picomaviruses 
are important for the proteolytic activity of HRV-14 30**"*. 
These amino acids align with catalytic or specificity pocket 
residues of trypsin, suggesting that the catalytic mechanism 
utilized by picornavirus 30 cysteine proteases is closely re- 
lated to that of the cellular trypsin-like serine proteases. This 
is interesting because trypsin and chymotrypsin are inactive 
as precursors, which is in sharp contrast to the viral 30 
proteases. Also, unlike the cellular serine proteases, the viral 
30 cysteine proteases are believed to cleave both in cis and in 
trans (Krausslich and Wimmer. 1988). The question of 
whether the mechanisms of cis and trans catalysis are differ- 
ent has not yet been addressed. 

If the 30 cysteine proteases and cellular serine proteases 
are structurally and functionally related, it may be possible 
to convert a viral 30 cysteine protease to a serine protease by 
substituting a limited set of amino acids to compensate for 
the Oys-146 to Ser change, which by itself inactivates 30^"*. 
Support for this concept comes from the observation men- 
tioned earlier that S. aureus (strain V8) protease is a serine 
protease which cleaves after Glu residues and has a Thr-141/ 
His-160/Gly-162 complement of amino acids in the substrate- 
binding pocket (Drapeau. 1978; Bazan and Fletterick, 1988). 
In addition, animal flaviviruses and pestiviruses code for 
30^"*-like serine proteases with Arg/Lys cleavage specificity 
and only limited homology with the trypsin class of serine 
proteases in and around the substrate-binding pocket (Bazan 
and Fletterick. 1989). 

In conclusion, our site-directed mutagenesis results com- 
bined with a knowledge of the physicochemical properties of 
purified 30 proteases together with x-ray crystal structure 
data, will lead to a better understanding of the catalytic 
mechanism utilized by this unusual class of proteases. 
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squaxa with the computer program COR£LS {^0), 
The pcaitiona) parainctcn of indhridual aicnm were 
then refined subfcct to stereochemical restraints by 
using the subcell data (tf). The positions of missing 
side-chain atoms and those of the benzamidine and 
calcium were determined from the nibccU difiEcrence 
electron dcruiiy map computed from the refitKd 
model. A modd of the full cryrtallographic asym- 
metric unit tn the corrca PZtZlZi unit cell was then 
construacd by adding a rcpUcare of the trypsin 
molecule translated by 46 A along the * and 32 A 
aJung c. The fuU modd was refined in three stages. 
In each stage the model was refit to a difiEcrence 
Fourier map computed with the cocffidcnts 
(2f (** - Strong pcato in the cfcciron density 

in positions consistent with hydrogen bond contacts 
to the protein or other established solvent positions 
were inchidcd in the model as ordered strivcnt. Next, 
the positional and thermal parametcn of all atoms 
were refined by iterations of restrained crystallo- 
graphic least squares, with data in the resolution 
range 6 A ^ £ 2.3 A. Refinement was txap^cd 
when further cycles failed to reduce the cryuillo 
graphic R factor and when the mean shift in coordi- 
nate positions was less than 0.05 A. Refined coordi- 
nates were then used to compute phases for a new 
electron map to be used in the next stage of manual 
refitting. After the third srage (K &ctor = 0.18), 
examination of the electron density failed to reveal 
errors or ambiguity in main- or side-chain positions, 
although the side chains of six residues located at the 
surface of the itk^ccuIcs were disordered and ccwld 
not be defined. Up to this point, side-chain atoms 
for His", Asn'", or Ser"* had been excluded from 
the nrKtdcl. A difference electron density map 
(fob. ~ revealed stror^ and well-ordered den- 

sity for the Asn"" and Scr'*, but die His" residue 
appeared to be stadsticaUy disordered (Fig. 2, top) 
iU). 

10. J. L. Sussman, S. R. Holbrook, G. M. Church, S. H. 
Kim,^(X« CryaaOi^. A32, 311 (1976). 

11. The possibility that one or other of the pealu are 
artifaccual was tested by independent refinement of 
two alternative models: one with His" fit to the 
stronger, internal densiry and the second with Hi**' 
fit to the external density. In each model the His" 
atoms were assigned full occupancy and side-chain 
positions for Asn"" and Ser"^were included. Eadi 
model was subjected to restrained crystaUognphk 
Fefuicment by varying the dtcnnal and positional 
parameters of all atoms. Subsequently, a difference 
Fourier map (F^b. - ^ck) ww computed for cadi 
model with the use of the refined positiortal and 
thermal paramcrcrs for all of the atoms in the 
respective models. In both cases, residual electron 
density appeared at the alternative histiditK site. 
Again, the observed densiry peaks were contiguous 
with die Cp atom of His" and thus could noe be 
interpreted as ordered water molecules. The relative 
occupancy of the two histidinc posidons and the 
total occupancy of both positions relative to other 
hisiidine side chains was estimated by intention of 
difference electron density at all of the histidirK side- 
chain positions in one of the trypsin molecules in the 
asymmetric unit. The difference Fourier map 
(Fob. - F^uc) intention was computed 
from a model in which the side-chain atoms of all 
four histidJne residues (at sequence positions 40, 57, 
70, and 87) were removed from the coordinate set 
of one molecule. Integration was performed manual- 
ly by summing over all grid points within 2.0 A of 
histidinc atomic positions that had electron density 
at least one standard deviation greater than the 
background density. Aiter normalizadon the appar- 
ent relative integrated difference densities at the 
histidinc side-chain positions were: His^, 0.87; 
His". 0.60; His™, 0.79; and His", l.O. AD but 
His*' are well ordered, so the range in inicgraied 
densities reflects thermal motion and experimental 
error. The sum of the density over the two His" 
side- chain sites is lower than the mean density of the 
well-ordered histidinc side chains^but u consistent 
with the high B faaors of His atoms at both 

Eisitions. The relative occupancy of the alternative 
is" positions was estimated by integrating the 
difference density at the N& I and Cel atoms of the 
gauche conformer and the C&2 and Ne2 atoms of 
the uans conformer and by talcing the ratio of the 



integrated deruities for the two positioiu. The re- 
maining histidinc atoms were not included in the 
tntegrsiion because the resolution of the data set did 
not allow the densides of the two conformers to be 
resolved at chose posidoiu. 

Final refined positional and thermal parameters 
for both trans and gauche confbrmcn were deter- 
mined fay refining an atomic modd in which both 
conformers were simultaneously included. Side- 
chain atoms of the gauche conformer m^ctc assigned 
occupancies of 0.67 and atoms of the traiu isomer 
were assigned occupancies of 0.33 based on the 
estimate derived from the integration described 
above (i2). After three final cycles of refinement of 
all thermal and positional parametcn of both trypsin 
monomers in the asymmetric unit, the crystaUo- 
graphic R factor was 0. 161. 

12. A modified version of PROTIN (obtained from ). 
Smith) does not generate restraints between altcr- 
rute side-chain positions of a statistically disordered 
residue. This allows refinement of two conforma- 
tions of an amiiw acid simultaneously. 

13. W. Bode and P. Schwagcr, /. Mai, Bid. 98. 693 
(1975). 

14. R. Henderson, r^iii^ 54, 341 (1970). 

15. An upper estimate of the mean error in atomic 
position is 0.2 S A. It was obtaiitcd by an analysis of 
the variation of crystallographic R factor as a func- 
tion of resolution {16). 

16. V. Luzatti.jlf*B Cryitalh^. 6, 142 (1953). 

17. A. A. Kossiokoff" and S. A. Spencer, Biochemistry 20, 



SERINE PROTEASES FUNCTION IN 
many biological systems to hydrolyzc 
specific polypeptide bonds. Trypsin, a 
well- studied member of this femily, cata- 
lyzris the hydrolysis of peptide and ester 
substrates that contain lysyl or arginyl side 
chains. Serine proteases have the triad of 
residues Asp*°% His^\ and Scr'" at the 
active site (chymotrypsin numbering sys- 
tem). X-ray crystallographic studies reveal 
that these three residues arc in close proxim- 
ity, which suggests they may serve as a 
^cdonal interacting unit responsible for 
bond formation and cleavage during cataly- 
sis (i). Numerous chemical and physical 
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Studies indicate that Scr'" and His^' play 
crucial roles in catalysis. For example, selec- 
tive reaction of Scr^ with diisoptopylfluor- 



C S. Craik, Departments of Pharmaceutical Ghcmistiy 
and of Biochemistry ai»d Biophysics, University of Cali- 
fornia. San Francisco, San Francisco, CA 94143-0446. 
S. Roczniak, C Largman, W. J. Rutter, Hormone 
Research Institute ancTDepartnient of Biochemistry and 
Biophysics, University of California, San Francisco. San 
Francisco. CA 94143-0448. 



•Present address: NutraSwcet Company, Mount Pros- 
pect. IL 60056. 

tPresent address: Veterans Administration Hospital, 
Martinez, CA 94553, and I>epartinents of Internal 
Medicine and Biologic^ Qienustry. University of Cali- 
fornia. Davis, CA 95616. 



The Catalytic Role of the Active Site Aspartic Acid in 
Serine Proteases 

Charles S. Craik, Steven Roczniak,* Corey LARGMAN,t 
William J. RunrrER 



The role of the aspartic acid residue in the serine protease catalytic triad Asp, His, and 
Scr has been tested by replacing Asp''" of trypsin with Asn by sitc-dircctcd mutagene- 
sis. The naturally occurring and mutant enzymes were pniduccd in a heterologous 
expression system, puriiicd to homogeneity, and characterized. At neutral pYi the 
mutant enzyme activity with an ester substrate and with the Ser*''-specific reagent 
diisopropytfluorophosphate is approximately \(f times less than that of the unmodi- 
fied enzyme. In contrast to the dramatic loss in reactivity of Ser'", the mutant trypsin 
reacts with the His^-spccific reagent, tosyl-L-lysinc chloromcthylkctonc, only five 
times less efficiently than the unmodified enzyme. Thus, the ability of His*' to react 
with this affinity label is not severely compromised. The catalytic activity of the mutant 
enzyme increases with increasing p¥L so that at pH 10.2 the is 6 percent that of 
trypsin. Kinetic analysis of this novel activity suggests this is due in part to participa- 
tion of either a titratablc base or of hydroxide ion in the catalytic mechanism. By 
demonstrating the importance of the aspartate residue in cai^ysis, especially at 
physiological ^H, these experiments provide a rationalization for the evolutionary 
conservation of the catalytic triad. 
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ophosphatc (DFP) (2) or modification of 
the His^' of trypsin with tosyl-L-Iysinc 
chloromcthyi ketone (TLCK) (5) blocks 
catalytic activity. The collective data suggest 
that substrate hydrolysis is facilitated 
through nuclcophilic attack by the Scr"^ 
hydroxyl oxygen on the carbonyl carbon of 
the substrate. Concomitantly the hydroxyl 
proton of the serine can be transferred to the 
imidazole of His'' and subsequently donat- 
ed to the resulting leaving group (alcohol or 
amine) in the reaction. The remaining acyl 
enzyme intermediate is hydrolyzcd by a 
mechanism that is the reverse of its forma- 
tion except that water instead of Ser*^ 
serves as the nudcophile. The role of the 
buried carboxylatc of Asp'**^ in the catalytic 
process remains to be clarified experimental- 

The geometric relation of the amino acids 



Table 1 . Ratios of activity for trypsin and D 102 N 
trypsin. Assays for Z-Lys-S-Bzl were performed at 
pH 7.15 and 10.18 (sec legend to Fig. 1 for a 
description of the experimental conditions). Val- 
ues for k^[l] with DFP were determined by the 
method of Kitz and Wilson (24). Standard condi- 
tions (25) were used except when the initial DFP 
concentration was 10 mM in assays with D 102 N 
trypsin at pH 10.03; background hydrolysis of 
DFP was relatively rapid and enzymatic acuvicy at 
infinite times did not equal zero. In this case the 
*ob»'[I] value (where [I] is the concentration of 
inhibitor) was determined by the method of 
Yosgimura a al. {26). Values of KibJ{l\ firom 
assays with trypsin were calculated to be 
790 d:8QM-' min"* (pH 7,96) and 980 ± 
70 Af" ' min" ' (pH 10.03). In assays with D 102 N 
trypsin these values were 0.070 ± O.OOSM ' 
min"' {pn 7.96) and 0.098 ± 0.019Af~* niin~' 
ipW 10.03). Titrations with MUGB were fol- 
lowed at 360 nm on a Perkin-Elmcr LS5 spcctro- 
fluoromcter and performed in triplicate in 50 rruVf 
Hcpcs buffer. pH 7.5, that contained 2 yM 
MUGB. Titrations of trypsin were complete in 2 
seconds (the minimum detection time of the 
fluoromctcr) or less when enzyme concentrations 
ranged from 50 nM to 400 nAf. Approximately 
17 minutes elapsed before a molar equivalence of 
MUGB reacted with 400 nM D 102 N trypsin. 
Values for k^yj{l] with TLCK were determined 
by the method of Kitz and Wilson (-24); standard 
conditions were used (27). KhJi^ values from 
assays with trypsin were calculated to be 
760M~' min~* (p^i 7.16) and 387JVf ' min"' 
{pH 8,77). In assays with D 102 N trypsin these 
values were 149Af"* min"* (pH 7.16) and 
281iW' min~* {pH S,77). The instability of 
TLCK and MUGB at alkalainc/>H values preclud- 
ed these assays at higher pH values. 





Ki- 
netic 
con- 
stant 


Relative activity 


Ligand 


Neutral 
pU 


Alkaline 
pH 


2-Lys-S-Bzl 

Z-Lys-S-Bzl 

DFP 

MUGB 

TLCK 


Vfitr 


4,400 
11,300 
11,300 

>500 
5.1 


18 
152 
10,000 

1.4 
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in the catalytic triad led to the postubtc that 
Asp***^ serves in concert with the hisddinc 
imidazole group to transfer the proton from 
the serine in a charge-relay mechanism (4). 
However, nuclear magnetic resonance 
(NMR) studies (5) showed diat the Asp'°^ 
and the His^' moieties displayed normal pK^ 
values {K^ is the ionization constant); this is 
incompatible with the implications of the 
charge-relay mechanism {6). Furthermore, 
neutron diflEiraction and 'H NMR studies of 
the imidazole nitrogens in the resting state 
of the enzyme show that no proton transfer 
occurs from His^^ to Asp"'^ (7). Asp*®^ may 
be involved in the stabili^tion of the imida- 
zolinium intermediate and the orientation of 
the correct tautomer of His^ relative to 
Scr'"' and the substrate (8). However, a test 
of the function of Asp'^ by selective chemi- 
cal modification, has not been possible be- 
cause it is inaccessible to chemical reagents 
under nondcnaturing conditions. Wc have 
evaluated the catalytic role of Asp'**^ by 
replacing this residue with Asn. This elimi- 
nates the negative charge with little change 
in the van dcr Waals surface of the side- 
chain atoms (NH^ versus OH). 

Conversion of the Asp*" codon (GAG) 
to an Asn (AAC) codon within the rat 
aruonic trypsinogen DNA (P) was accom- 
plished by site-directed mutagenesis {10). 

Fig. 1. Profile of activities for trypsin and D 102 
N trypsin-catalyzcd hydrolysis of 2>Lys-S-Bzl. 
(A) Plot of log(Ac«i/iCm) versus^ H and (B) plot of 
log *«i versus /?H, for trypsin (•), and D 102 N 
trypsin (O). Assays were performed at 25'X;; in 50 
mM Mcs [2-(N-morpholino)ethancsuifonic add]. 
Mops, or Taps buffers, pH 4.43 to 8.77, or 50 
mM glycine, pH 9.25 to 10.18, that contained 
O.lAf NaCl and 1 mM CaCI^. Stock solutions of 
ZrLys-S-Bzl and 4,4'-dithiodipyridine were pre- 
pared in water and dimethylformamidc, respec- 
tively. The pH of all reacnons was determined 
immediately after reaction. To a cuvette that 
contained 0.97 ml of the assay solution was added 
10 ^ of a 25 mM solution of 4,4' -dithiodi pyri- 
dine (final concentrations: 250 jjlM 4,4' -dithiodi - 
pyridine and 1% dimethylformamidc) and 10 ^ 
of a Z-Lys-S-Bzl stock solution. The concentra- 
tion of substrate ranged fix>m ten times greater 
than to ten times less than the Kf„ of the enzyme. 
AJfter the background rate of hydrolysis was mea- 
sured spcctrophotomctrically (Bcckinan DU-7) at 
324 nm, 10 of an enzyme stock solution (in the 
case of trypsin, diluted in O.S mg per milliliter of 
bovine scrum albumin) was added and the initial 
rate of hydrolysis was measured. At values 
greater than 9.25, for which the background 
hydrolysis was substantial (up to 2% 2rLys-S-Bzl 
hydrolyzcd per rninute), a reference cell that „ <• u 

contained substrate and 4,4'-dithiodipyridinc was used during kincnc measurements. In aU of the assays 
the initial rates were measured from data for the initial 5 to 10% of die hydrolysis of substrate. Z-Arg-S- 
Bzl was not used as substrate because this compound shows a background hydrolysis rate 20 tunes 
Ercater than that for Z-Lys-S-BzJ at alkaline pH {14). Substrate and enzyme concentration d«ennina- 
rions were performed with standard procedures {29, 30). Values for *«! ^/^m parameters from aU 
assays were derived by a program diat performed a weighted linear and nonlinear squares .re^«w;on 
analysis of data by using the Lincwcavcr-Burk and MichacUs-Menton equanons, respectively (3i), 
Double reciprocal plots of the data were linear in aU cases. Values of pK» Mid were determined by 
the pTX>gram MULTI (32) which performs a nonlinear squares analysis of the data. 
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The DNA that encodes the mutant enzyme 
was sequenced in its entirety to ensure that 
no inadvertent base changes were intro- 
duced during the mutagenesis procedure. 
The mutant enzyme trypsin"*^ (Asp—*- Asn), 
referred to as D 102 N trypein and the 
naturally occurring trypsin were expressed 
under the control of the simian virus 40 
(SV40) early promoter (ii) in stably trans- 
formed cukaryotic cell lines that secreted the 
zymogen form of the enzymes into the 
culture medium {12). D 102 N trypsin and 
trypsin were purified to homogeneity and 
crystallinity by a combination of ion-ex- 
change and affinity chromatography tech- 
niques. Trypsin isolated fixjm this expres- 
sion system displayed physical and catalytic 
properties identical to trypsin purified from 
the rat pancreas. In contrast, D 102 N 
trypsin exhibited dramatically different cata- 
lytic aaivity. 

The aaivitics of trypsin and D 102 N 
trypsin toward various substrates and inhibi- 
tors arc compared in Table 1 . At neutral />H 
the catalytic efficiency of D 102 N trypsin as 
measured by its ability to hydrolyzc the 
ester substrate N-bcnzyloxycarbonyl-L-ly- 
sine thiobcrizyl ester (Z-Lys-S-Bzl) is severe- 
ly compromised (Jfecat <x Jfecat/^m values are 
—10^ rimes lower than that of trypsin; *cai is 
the catalytic rate constant and is the 
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Michaelis constant). However, the relative 
activity of the mutant enzyme progressively 
increases with increasing pH values. To de- 
termine the relative reactivity of Ser'*^ and 
His'' both enzymes were created with the 
specific active site— directed reagents DFP 
and TLCK. The inhibition of D 102 N 
oypsin by DFP, which is specific for Scr*^, 
is approximately four orders of magnitude 
slower than thai of trypsin at both pH 8.0 
and pH 10 0. The active site titrant 4- 
mcthylumbcUiferyl • p - guanidinobcnzoate 
(MUGB) {13) also reacts with D 102 N 
trypsin at a rate at least 50t>-f6!d slower than 
with trypsin at pH 7.5. These data sug- 
gest that the nuclcophilicity of Scr'" is 
dependent on the negative charge of 
Asp'^^ 

The substrate analog TLCK reacts spccifi- 
caDy with His'', presumably because the 
binding pocket of the substrate positions the 
reactive chloromcihyl-kctonc group adja- 
cent to His''. In contrast to the large de- 
creases in activity monitored with DFP and 
MUGB, TLCK is five times less reactive 
with D 102 N trypsin than with trypsin at 
neutral pH (pH 7.2) and one and a half 
times less reaaive at more alkaline pH (pH 
8.8). Thus the active site reacts virtually 
normally with the affinity reagent. The dif- 
ferential effect of the Asp to Asn subsdui- 
tion on the inhibition of D 102 N trypsin by 
DFP and TLCK may be due to differences 
in the proximity of the reactive groups of the 
inhibitors and the enzyme. However, a 
more likely explanation is that the imidazole 
of His'' in D 102 N trypsin is not in the 
correct tautomeric state for removal of the 
Set*" proton and thereby reduces the reac- 
tivity of the enzyme to DFP. However, 
His" can still react with the chloromcthyl 
ketone moeity of TLCK and thereby inhibit 
the enzyme. 

The modified and unmodified enzymes 
exhibit different pH activity profiles for the 
ester substrate (Tabic 1 and Fig. 1). Similar 
data have been obtained with peptide sub- 
strates (14). In agreement with studies on 
bovine cationic trypsin (i5), rat anionic 
. trypsin shows a sigmoidat dependence of 
activity (pK^t = 6.8) with nwximal *ciii 
KJKrn values of 7498 ± 254 min"* and 
1.20 ± 0.28 X lO^Af"' min~', respectively 
(id, 17). The rat enzyme resembles porcine 
clastase {18) but differs from bovine trypsin 
in being alkaline stable. The dominant effect 
of the Asp ro Asn mutation is on kcm. The 
Kfn values of the two enzymes are similar at 
any given pH value. The D 102 N trypsin 
activity is dramatically lower (—10* times as 
measured by Jfecot or Acat/^m) than trypsin 
activity at neutral pH values; however, it 
increases progressively at alkaline pH values 
from the low value at neutral pYi to values 
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FI9. 2. The pH dependence of 
die kinetic parameter k^^^Km of 
D 102 N trypsin-catalyzcd hy- 
drolysis of Z-Lys-S-Bzl. The 
points correspond to the cxpcri- 
mcntaUy derived k^K^ values. 
Curve A' is derived from substi- 
tuting the calculated rare and 
equilibrium oxutants Aohi 

and K2 into Eq. L Values 
for AoH vkI A2 were determined 
6rom assays performed from pW 
8.36 to 10.18 where it is as- 
sumed that Ki » [H*] and 
*oh(OH"] >> Equation 
I can then be simplified and 
rearranged to describe a 
r** straight line: (*„/^™)[H*] 

« -JC2(*«i/J^in) + (10 **)*oH- Linear regression of this line yidds Aqh ^2 values of 

I 45 £ 0.12 X 10"Af-' min"' and 1.21 ± 0.30 x IQ'*^M, respectively. Values ofKj and were 
drtemiined from assays poformed from pH 4.43 to 7.33 where [H^) » Kj, By using the Aoh value 
determined above, Eq. 1 can again be simplified to a linear form: [k^tfK„)[ii*] - 1-45 x 10 ]/ 
rH-^l ^ l/Kt[lAS X 10'^ - {k^^K^)[H*]] + Linear regression analysis of this line yields 
and Jc, values of 4.78 ± 0.22 x 10* min"* and 3.67 ± 0.32 x 10-*Af, respectively. Inset: Plot of 
WJ^m versus pH from />H 4.43 to 7.33. Curve A is the same as described above. Curve B dcscnbcs the 
contribution to the catalytic rate of D 102 N trypsin that depends on lOH"]: *oh[OH ]/(1 + Kjf 
\H^]) Curve C describes the contribution to the catalytic rate of D 102 N trypsin independent of 
OH-] detected at lowcr^H values: *,/[! + ([H")/JC,) + {KA^"])Y Note thatcurvc A is die *um of 
curves B and C. The doned line perpendicular to the abscissa is the pK^ of the mutant enzyme calculated 
from the inflection point of the activity profile. 

Table 2. Values for Aoh, and pK^ derived from the D 102 N trypsin-<^yKd hydrolysia ofZ- 
Lys-S-Bzl. The Aqm, Km, and pK^ parameters derived from k^JK^ values were determined as dracnbcd 
in the legend to Fig. 2. The pKz values for and were not determined due to cxpcnmcntal 
constraintt described below. The Ac.i parameter docs not appear to depend on the ionization of a 
residue in the pH range between 4 and 8. Equation 1 can then be reduced to: 

4c« -= [Wl + iK2f[li*m + [*OH[OH-)/i + (JCy[H*])] 

Values for *oh and JCj can be determined from assays performed atpH vahies of 8 and greater where it 
is assumed dui AoHfOH"] » Ac« The equation can tiicn be rearranged to die Imear form *c*tlH J 
= -Kjk,^ + (10-")AoH. Linear regression analysis of thU line with data from ^V^ff^^^^^^ 
pH 7.96to 10.18 yields a *oh value of 5.50 - 0.21 x lO-Vtf- min- and a JC, yalueof 5^9 ± 0.50 x 
10" ' ^M. The vahic of *cnz can be estimated from assays performed at/»H values less dwn 8 where J 
» JCi By using the koH value determined above the equation can be reduced to ien» 
A. - 5 50 X lO^rOH- 1. Subtracting die calculated 5.50 x I0*(OH-] values firom die experimentally 
toived X., valued froni pH 4 A3 to pH 7.33 gives a t^. value of 0.37 0.09 min" ' "nie 
dependence of the acylation rate constant fc, of the D 102 N trypsm-^atalyzed hydrolysis of Z^Lys-S-Bzl 
widctcnnincd by performing assays at 25'C in 50 mAtf Mcs, Mops, or Taps bufes, jjH 4.81 to 8.36 
under identical conditions as for assays described in die legend to Fig. 1 except diat D 1 02 N to^psm 
concentrations (4 to 40 \tM) were in large excess over xhc initial substrate conccntrauon (0.54 i*^) and 
the reaction was allowed to proceed to completion. Assays performed at^H values above ;»H 8.4 were 
too fost to foUow spcctrophotometrically dicreby preventing die detcrmmation of (acyUtion) values. 
Values for and IC^ were determined by xhc procedure of Kccdy and Bender (28) , The *oh 




(*,IH*] - 4.91 X 10-»)/[H*) = (1//C,)(4.91 x lO"' - *2lH*]) + *, 

Linear regression analysis of diis line widi kj values determined from assays pciforrned from 4.81 to 
pH 6.70 yielded a Ae«/value of 1 .32 S: 0.08 min" ' and a if, value of 5.35 ± 1.00 X 10 Values for 
A, (deacylation) were calculated using xhc cxpcrimentaUy derived fcc« and *j values and die equation: 
A, = (k .ktVik-, - A...). The *oh vJue was determined from a plot of the *, values versus solvent 
hyJn^^d^ ion con^^^^ pH 6.70 to 8.36; *oh = 4^7 ± 2.43 x lO'^fj' min". The 

maximal value of the deacylation rate constant of the hydroxide- mdcpcndent pathway, *cr«, w 
calculated by incorporating die A.„, values for Aj and A^., determined above into the equation A, - A 
kMi - Ac,,). This gives a (deacylation) of 0.51 ± 0.07 min''. The value of A, bkc A^., shows 1 
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that approach those of the native enzyme 
6%: K.JKrn 1%) at pH 10.2. 
The ascendant alkaline limb of the acrivi- 
xy-pH profiles of the D 102 N trypsin is not 
an artifact due to deamidation of the Asn 
residue to Asp, since mutant enzyme activity 
at neutral pHs is not affeaed by preincuba- 
tion at alkaline pH. Furthermore, one would 
expect the pH activity proffles to be similar 
in shape to those of the naturally occurring 
enzyme if they merely reflected contamina- 
tion by trypsin. We ascribe this ascendant 
basic limb to the participation of a titratablc 
base or bases or of OH " itself. Although the 
mechanism of catalysis by the D 102 N 
trypsin is unknown, the pH rate profile of 
KJKtt, can be described by a bipartite rate 
equation in which one part represents the 
catalytic rate dctcaed at the lower values 
and the other part describes the catalytic rate 
that shows a dependence on hydroxide ion 
concentration {19). The observed rate con- 
stant kcut^Ktn can be defined as: 



tnz 



1 + ([H^]//C,) + (/(:2/[H*]) 



Aoh[OH-] 
1 + {KAH^) 



where Aenz is the rate constant of the hydrox- 
ide independent pathway, Ki and Kz are the 
dissociation constants of the ionizing 
groups, and *oh is the rate constant of the 
hydroxide ion dependent pathway. The cat- 
alytic activity of the OH "-activated and 
OH "-independent pathways can be re- 
solved with Eq. 1. Values for k^JKm deter- 
mined from mutant enzyme activity studies 
above pH 8.0 show an increase with solvent 
hydroxide ion concentration that yields Aqh 
and Kt values of 1.45 ± 0.12 X lO^Af"^ 
min'* and 1.21 ± 0.30 x 10"»**M {pKz 
= 9.9), respectively. Between pH 8.0 and 
pH 8.8 the k^JKrr, values increase linearly 
with hydroxide ion concentration. The 
slight decrease from linearity above pH 8-8 
may reflect the ionization of another group 
with an alkaline pK^ value such as the lysine 
substrate or the amino-terminal group of the 
protein {20). 

There is good agreement between the 
calculated ifecat/^^'m curve derived from Eq. 1 
and the experimentally derived values (Table 
2 and Fig. 2). Measurements of k^iJKm 
values below pH 8.0 yield Aem and X, values 
of 4.78 ± 0.22 X lO^Af"' min~' and 3.67 
2: 0.32 X 10"*Af {pKx = S.4), respectively. 
A comparison of the *enz value for D 102 N 
trypsin and the maximal k^ai^Ktn value for 
trypsin indicates that the activity of the 
mutant enzyme (ignoring the contribution 
of the OH" -dependent pathway) is 25,000 
times less than that of trypsin. Thus Asp 
is crucial for the catalytic aaivity at neutral 
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pH values. However, the rate of hydrolysis 
by the mutant enzyme is still 400 times 
greater than the rate of solvent hydrolysis of 
the substrate. The inflection points of the 
curves in Fig. 2 suggests that the pK^ of 
His^' has decreased 1.5 pH units in D 102 
N trypsin compared to trypsin. The putative 
alteration in the pK^ value of His^^ reflects 
the replacement of the negatively charged 
carboxylatc group with a neutral amide 
group. The mutant enzyme exhibits classic 
burst kinetics on ester substrates below pH 
7.0. This implies that an acyl enzyme inter- 
mediate accumulates and that dcacylation is 
rate determining in this pH range (14). 

It has been suggested that Asp'^^ controls 
the position of the neighboring His^ resi- 
due that in turn modulates the polarity of 
the Scr'^^ (8). Our demonstration of the 
crucial role of Asp'*" is not surprising in 
view of the strict evolutionary conservation 
of this residue within the catalytic triad. The 
magnitude of the catalytic defect from the 
A5p'°^ — > Asn replacement and the alkaline 
activation of the enzyme arc une;cpccted. 
The three-dimensional structure of D 102 N 
trypsin is virtually identical to that of trypsin 
in the alkaline pH range {21). Thus the 
activity of the mutant enzyme arises from an 
active site conformation that resembles the 
native structure. Certain properties of the D 
102 N trypsin superficially resemble chymo- 
trypsin methylated at His" (22). The activi- 
ty of both enzymes is dramatically lower at 
neutral p¥l values and increases in propor- 
tion to OH" concentration. However, the 
rate constant ascribed to the reaction with 
OH" ions is 1000 times greater for the D 
102 N trypsin mutant than for chymotryp- 
sin with the modified histidinc. Neverthe- 
less, these results are consistent with the 
view that compromising the function of the 
histidine dramatically decreases catalytic ac- 
tivity at neutral pH values. This dcfca can 
be partiy overcome at basic pH. The alkaline 
pH may affect the catalytic reaction indirect- 
ly by affecting the ionization of groups tha^ 
function in catalysis. Alternatively, OH" 
might participate dircaly in the reaction; 
this would require activation at very low 
hydroxide ion concentrations. The overall 
catalytic mechanism of the D 102 N trypsin 
activity is unknown at present. The activity 
may be due in part to a nucleophilic contri- 
bution firom the imidazole nitrogen of His 
instead of Ser*'* as has been detected in the 
cleavage of active esters of nonspecific sub- 
strates (23). Alternatively, a residue distant 
from the active site may contribute to stabi- 
lization of the tctrahedral intermediate at 
basic pH. Whatever the mechanism of ac- 
tion, D 102 N trypsin displays distinctive 
properties that distinguish it from trypsin. 
Its low activity in the neutral pH range 



makes it an unattractive catalyst for most 
biological fimctions; thus it might not be 
expected to persist in evolution. The Asn 
mutant, however, is of considerable interest 
as a distinctive serine protease. This work 
illustrates the potential for creating new 
variants that are rK>t found in nature because 
they arc active under extreme conditions 
that are usually incompatible with cclhilar 
environments. 
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Adrenal Medulla Grafts Enhance Recovery of Striatal 
Dopaminergic Fibers 



Martha C. Bohn,* Lisa Cupit, Frederick Marciano, 
Don M. Gash 



The drug, I -mcth^-4-phcnyl- 1,2,5,6- tctrahydropyridine (MPTP), depletes striatal 
dopamine levels in primates and certain rodents, including mice, and produces 
parUnsonian-like symptoms in humans and nonhunian primates. To investigate the 
consequences of grafting adrenal medullary tissue into the brain of a rodent model of 
Parkinson's disease, a piece of adult mouse adrenal meduUa was grafted unilaterally 
into mouse striatum 1 week after MPTP treatment. This MPTP treatment resulted in 
the virtual disappearance of tyrosine hydroxylase-immunoreactive fibers and severely 
depleted striatal dopamine levels. At 2, 4, and 6 weeks after grafting, dense tyrosine 
hydroxylasc-immimorcactive fibers were observed in the grafted striatum, while only 
sparse fibers were seen in the contralateral striatum. In all cases, tyrosine hydroxylase- 
immunorcactivc fibers appeared to be ftx>m the host rather than from the grafts, which 
survived poorly. These observations suggest that, in mice, adrenal medullary grafts 
exert a neurotrophic acdon in the host brain to enhance recovery of dopaminergic 
neurons. This cflFect may be relevant to the symptomatic recovery in Parkinson^s 
disease patients who have icccivcd adrenal medullary grafts. 



IN HUMANS, THE DRUG, I-METHYW 
phenyl-1 ,2,5,6-tetrahydropyridinc 
(MFTP), produces motor deficits that 
closely resemble those observed in Parkin- 
son's disease (7—4). This observation has led 
to the development of animal models of 
Parkinson's disease that arc valuable for 
studying the effects of brain grafting (5). 
MFTP damages the dopamine (DA) -con- 
taining A9 cell group in the pars compacta 
of the substantia nigra and results in a 
degeneration of the nigrostriatal DA fibers 
and loss of striatal DA and its metabolites 
{1-8). The severity of this damage is spccics- 
dcpendcnt. In primates, MPTP treatment 
damages both the DA fibers and cell bodies 
(1-5). In mice, the fibers arc damaged, but 
many A9 neurons survive ((5, 7), Because the 
MPTP lesion is transient in mouse (7, P), 
the MPTP-trcatcd mouse provides an op- 
portunity for studying recovery of identified 
neurons in the brain. Our study suggests 



that striatal grafts of adult mouse adrenal 
medulla enhance recovery of these neurons. 

Two MPTP treatments were compared for 
dieir effects on striatal DA levels and tyrosine 
hydroxylasc-immunoreactivity (TH-IR) in 
the striatum and A9 region of C57BL/6 mice 
(6 to 12 weeks old; 21 to 28 g). As described 
{6, 7), lightly etherized mice received multiple 
injections of MPTP-HQ subcutancously in 
0.5 n\l of saline. Group A received three 
injections of 30 mg per kilogram of body 
weight at 24- hour intervals and group B 
rccrivcd two injections of 50 mg per kilogram 
of body weight 16 hours apart. Catechol- 
amines in tissues were isolated and measured 
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Localization of the mosaic transmembrane serine protease corin to 
heart myocytes 
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Conn cDNA encodes an unusual mosaic type II transmembrane serine protease, which possesses, in addition to a 
trypsin-like serine protease domain, two frizzled domains, eight low-density lipoprotein (LDL) receptor domains, 
a scavenger receptor domain, as well as an intracellular cytoplasmic domain. In in vitro experiments, recombinant 
human corin has recently been shown to activate pro-atrial natriuretic peptide (ANP), a cardiac hormone essential 
for the regulation of blood pressure. Here we report the first characterization of corin protein expression in heart 
tissue. We generated antibodies to two different peptides derived from unique regions of the corin polypeptide, 
which detected immunoreactive corin protein of approximately 125-135 kDa in lysates from human heart 
tissues. Immunostaining of sections of human heart showed corin expression was specifically localized to the 
cross striations of cardiac myocytes, with a pattern of expression consistent with an integral membrane 
localization. Corin was not detected in sections of skeletal or smooth muscle. Corin has been suggested to be a 
candidate gene for the rare congenital heart disease, total anomalous pulmonary venous return (TAPVR) as the 
corin gene colocalizes to the TAPVR locus on human chromosome 4. However examination of corin protein 
expression in TAPVR heart tissue did not show evidence of abnormal corin expression. The demonstrated corin 
protein expression by heart myocytes supports its proposed role as the pro- ANP convertase, and thus a potentially 
critical mediator of major cardiovascular diseases including hypertension and congestive heart failure. 

Keywords: serine protease; corin; heart; pro-atrial natriuretic peptide (pro- ANP); TAPVR. 



Serine proteases are found in all living organisms, ranging from 
viruses to humans [1], where they serve important and varied 
biological functions in situations requiring limited proteolysis. 
Their activities impact on areas as diverse as hemostasis, tissue 
remodelling and wound repair, inflammation, angiogenesis, 
fibrinogenesis and fibrinolysis. Cell surface serine proteases 
have been associated largely with extracellular matrix degra- 
dation, but there are emerging roles for these proteases in 
generating bioactive matrix protein fragments, influencing the 
release, the activation and bioavailability of growth factors and 
in shedding of cell surface proteins [2—6], 

Many serine proteases are mosaic proteins comprising 
multiple, structurally distinct domains necessary for regulating 
enzymatic activity. Circulating serine proteases of the blood 
coagulation (e.g. prothrombin and factor X) [7], fibrinolysis 
(e.g. plasminogen activators) [8] and complement (e.g. Clr and 
Cls) [9] systems are well characterized examples of mosaic 
proteins. While the vast majority of known serine proteases are 
secreted, more recently some serine proteases have been found 
to possess integral transmembrane domains. The proteins 
enteropeptidase [10], hepsin [11] and most recently, TMPRSS2 
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[12] are examples of mosaic serine proteases with type II 
transmembrane domains. These enzymes are positioned on the 
plasma membrane via a membrane spanning domain close to 
the N-terminus, In addition to membrane spanning and protease 
domains, enteropeptidase also contains two low-density lipo- 
protein (LDL) receptor domains, a meprin-like domain, two 
Clr-like domains and a truncated scavenger receptor domain. 
An LDL receptor domain and a scavenger receptor domain 
have also been identified in TMPRSS2 [12]. The functions of 
these domains have not been determined. 

Serine proteases play important roles in several aspects of 
heart physiology and cardiovascular disease [13]. The mast cell 
serine protease chymase is believed to be the major converter of 
angiotensin (ang)I to angll in human heart tissue [14]. The 
involvement of angll in normal cardiac function as well as in 
heart ailments such as hypertrophy, heart failure and ischaemic 
heart disease is indicated by the finding that inhibition of the 
angiotensin converting enzyme (ACE), leads to beneficial 
outcomes for sufferers of these diseases [15]. However, ACE 
inhibitors block only 10-20% of angi conversion in heart tissue 
whereas the remaining activity is blocked by serine protease 
inhibitors [16]. The fibrinolytic serine proteases tissue-type 
plasminogen activator (tPA) and urokinase-type plasminogen 
activator (uPA) are also thought to be involved in the 
progression of heart disease. uPA is present at significantly 
elevated levels in the atherosclerotic lesions responsible for 
myocardial infarction and failure [17]. The reduction in tPA 
from arteriolar smooth muscle cells is linked to the develop- 
ment of coronary artery disease in transplanted hearts [18]. 

Our own work and that of Yan et al. [19] has led to the recent 
cloning of a cDNA encoding a novel, multidomain type II 
transmembrane serine protease from human heart. The 
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predicted protein, corin, comprises two frizzled domains, eight 
LDL receptor domains, a truncated scavenger receptor domain, 
in addition to the extracellular trypsin-like serine protease 
domain [19]. Recent expression of recombinant corin demon- 
strates that it possesses pro-atrial naturitic peptide (ANP) 
convertase activity [20], and thus may play a critical role in the 
regulation of hypertension. In situ hybridization studies of 
mouse embryonic heart showed that corin mRNA was 
expressed as early as day 9.5 and maintained its expression 
through the adult animal [19]. The corin gene was mapped to 
human chromosome 4pl2-13 [19], near the locus for the 
congenital heart disease, total anomalous pulmonary venous 
return (TAPVR). Here we present data describing for the first 
lime native corin protein expression and localization in human 
heart- 

MATERIALS AND METHODS 

Identification of corin cDNA by homology cloning 

Homology cloning was performed by RT-PCR using degenerate 
oligonucleotides corresponding to conserved regions of serine 
proteases [21-24]. Total RNA was isolated from Sla cells [25] 
following treatment with TNFa and cycloheximide for 4 h. 
RNA (5 ^jig) was reverse transcribed at 42 ^'C using AMV 
reverse transcriptase (Promega, Madison,WI) in the presence of 
oligo dTi2-i8 (0-25 |xg |xL"*) (Pharmacia Biotech, Sweden), 
50 mM Tris/HCl, pH 8.3, 50 mM KCl, 10 mM MgCh, 10 mM 
dithiothreitol and 0.5 mM spermidine in a total volume of 
20 |xL. PGR was performed using 1 |xL of the reverse 
transcriptase reaction mixture, 500 ng of each primer, 10 mM 
Tris HCl, pH 8.3, 50 mM KCl, 1.5 mM MgClj. 0.2 mM dNTPs 
and l-2units of Taq polymerase (Perkin Elmer). The primers 
were as follows. Forward, 5'-ACAGAATTCTGGGTIGTIACI- 
GCIGCICAYTG-3'; reverse, 5 '-AC AGAATTC AXIGGICCI- 
CCI(C/G)(T/A)XTCICC-3'; where X = A or G, Y = C or T; 
I = inosine). 

Cycling conditions: 2 cycles of 94 °C for 2.5 min, 35 ''C for 
2.5 min and 72 ''C for 3 min, followed by 33 cycles of 94 °C 
for 2.5 min, 57 °C for 2.5 min and 72 °C for 3 min, with a final 
extension at 72 ^C for 7 min. PCR products of approximately 
450 bp were ligated into pGEM-T (Promega, Madison, WI, 
USA), cloned and analysed by DNA sequencing. A DNA 
fragment was identified which represented the partial corin 
sequence (nucleotides 334-748). The cDNA was extended 333 
nucleotides towards the 5' end by screening a cDNA library 
using two rounds of PCR and the nested oligonucleotides 
ATC2P3 and ATC2P1 in combination with the vector specific 
primer T7. The 3' end was extended to nucleotide 976 by two 
rounds of PCR and the nested oligonucleotides ATC2P4 and 
ATC2P5 in combination with the vector specific primer T3. The 
primer sequences are given below. 

ATC2P1: 5'-GCGTGTCTGCATGAACACTG-3'; ATC2P2: 
5'-ATGCCAAGCACCACTTTCCA-3'; ATC2P3: 5'-ATAGTC- 
CACCACTGCTCGAC-3'; ATC2P4: 5'-TTAAGCTGCAAGA- 
GGGAGAG-3'. 

The DNA sequence of this cDNA has been deposited in 
the DDBJ/Genbank/EMBL database under accession no. 
API 13248. 

Heart tissue specimens 

Tissues from explanted hearts with terminal heart failure were 
either snap frozen in liquid nitrogen (for RNA and protein 
analyses) or processed for routine histological examination. Six 



paraffin embedded blocks of human heart tissue were obtained 
from autopsy cases with acute myocardial infarction. These 
blocks included both viable and nonviable myocardium. 
Procedures were in accordance with guidelines established by 
the National Health and Medical Research Council of Australia, 
Ethics Approval number EC9876(n). 

Northern and Poly(A)^ RNA dot blot analyses 

Human multiple tissue northern blots (Clontech, Palo Alto, CA, 
USA) contained 2 |xg of poly(A)"*^ RNA per lane. The blots 
were hybridized with a "P-dCTP labeled EcoKl digested DNA 
fragment encoding corin cDNA in ExpressHyb (Clontech) 
solution at 65 "C and washed to a final stringency of 
0.2 X NaCl/Cit. 0.1% SDS at 65 '^C. The blot was reprobed 
with p-actin as a measure of loading in each lane. For the 
mouse tissue blot, total RNA was purified from mouse tissues, 
separated by denaturing gel electrophoresis and transferred to 
Hybond-N nylon membranes as described [26]. The blot was 
hybridized with the radiolabelled human corin DNA probe 
under lower stringency conditions in ExpressHyb solution at 
55 °C and washed to a final stringency of 1 x NaCl/Cit, 0.1% 
SDS at 55 °C. The mouse tissue blot was stained with ethidium 
bromide to confirm RNA loading in each lane. 

Production of affinity purified antlpeptide polyclonal 
antibodies 

Rabbit polyclonal antibodies were generated against corin 
specific peptides derived from nonhomologous hydrophilic 
regions within the corin amino-acid sequence. Two peptides, 
each containing a cysteine residue incorporated at the C-terminus, 
were synthesized (Auspep, Parkville, Australia) and conjugated 
to keyhole limpet hemocyanin using |x-maleimidobenzoic acid 
N-hydroxysuccinimide ester. The peptides were: Al: IQEQE- 
KEPRWLTLHSNWE-C, A2: GHMGNKMPFKLQEGE-C. 
Rabbit antisera was peptide-affinity purified using SulfoLink 
coupling gel (Pierce, Rockville, IL). The specificity of each 
antibody was tested against the immunogenic peptide by 
ELISA. 

Western blot analysis 

Frozen heart tissue (100 mg) was homogenized in lysis-binding 
buffer (Dynabeads mRNA Direct kit, Dynal) and spun at 
13000xg for 2 min. The protein pellet was dissolved in 
reducing SDS-sample buffer for Western blot analysis. Proteins 
were separated by SDS/PAGE on 10% acrylamide gels and 
transferred electrophoretically to Hybond-P membranes 
(Amersham, Aylesbury, UK). Membranes were blocked with 
5% nonfat skim milk powder in Tris/NaCl (10 mM Tris/HCl, 
pH 7.0, 150 mM NaCl), incubated with affinity purified anti- 
peptide antibody, then with horseradish peroxidase conjugated 
sheep anti-(rabbit Ig) secondary antibody, and visualized by 
enhanced chemiluminescence (Amersham, Aylesbury, UK). 

Immunohistochemlstry 

Paraffin sections (5 jxm) of formalin-fixed human heart were 
deparaffinized, then rehydrated before antigen retrieval in 
boiling 10 mM citric acid buffer, pH 6. After cooling, 
endogenous peroxidase activity was inhibited by lOmin 
incubation in 1% hydrogen peroxide. Non-specific antibody 
binding was blocked by incubating the sections in 4% nonfat 
skim milk powder in NaCl/Pj for 15 min, followed by 10% 
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Fig. 1. Corin expression in human and 
mouse tissues. (A) Northern blot analysis of 
RNA isolated from a range of normal human 
tissues probed with ^^P-labelled corin cDNA. 
The levels of p-actin mRNA are shown as a 
control for loading. (B) Northern blot analysis 
of corin mRNA expression in a range of mouse 
tissues probed with ^^P-labelled human corin 
cDNA at reduced stringency. The levels of 
1 8S ribosomal RNA are shown as a control 
for loading. 
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normal goat serum for 20min. Affinity purified anticorin Al 
(1 : 100; 150 |jLg-mL~') or A2 antibodies (1 : 50; 
20 (xg-mL~*) were applied and incubated overnight in a 
humidified chamber at room temperature. Controls included 
sections incubated with no primary antibody or antibody that 
had been preadsorbed for 2 h at room temperature with 1 pig of 
the antigenic peptide. Following incubation with prediluted 
biotinylated goat anti-(rabbit Ig) Ig (Zymed, San Francisco, 
CA, USA), streptavidin-horseradish peroxidase (Zymed) was 
applied and color developed using the chromogen 3,3'-diamino- 
benzidine with hydrogen peroxide as substrate. The sections 
were counterstained in Mayers* haematoxylin. 



RESULTS AND DISCUSSION 

Isolation of human corin cDNA by homology cloning 

A PCR-based homology cloning approach was employed to 
identify serine protease cDNAs expressed by the Sla cell line 
[25] which is resistant to tumor necrosis factor-a induced 
apoptosis. Degenerate primers designed to anneal to cDNA 
encoding the conserved regions surrounding the catalytic 
histidine and serine amino acids of serine proteases [21-23], 
were used to amplify and then clone a range of DNA fragments 
of approximately 450 bp. One clone, designated ATC2, was 
found to encode a novel serine protease. The cDNA was 
extended in the 5' and 3' directions by library screening and the 
DNA sequence was deposited in the DDBJ/Genbank/EMBL 
database (accession no. AFl 13248). This sequence was 
subsequently determined to be 100% identical to a recently 
reported cDNA encoding the serine protease, corin (accession 
no. AFl 33845) [19]. 



Corin mRNA is strongly expressed in heart 

The tissue distribution of corin mRNA was examined by 
Northern blot analyses. Analysis of poly (A) RNA from 16 



normal human tissues showed a single transcript of approxi- 
mately 5.1kb detectable only in human heart (Fig, lA). 
Examination of a range of mouse tissues also demonstrated 
specific expression of corin mRNA of approximately 5,lkb 
only in mouse heart (Fig. IB). 



Corin - 




Fig. 2. Corin protein expression in human heart tissue by Western blot 
analysis. Immunoreactive corin protein of 125-135 kDa is detected in a 
protein lysate prepared from human heart tissue (Patient #7684), which is 
not detectable in a corin negative HeLa cell lysate. The blot was probed 
with anticorin antibody, AbAl, and visualized using enhanced chemilumi- 
nescence. The protein standards in kDa are as indicated. 
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Fig 3 Conn is localized to human heart myocytes by immunostaining. Immunohis.ochemical staining of human hear. ..ssues was performed usmg .he 
Iffiniw punned an.icorin peptide Al or A2 polyclonal an.ibodies as primary an.ibodies. (A) a longitudinal section of a represen.a.tve hear, "-e from a 
"ranslnt recipient (Pa.ien. #7684) stained wi.h AbAl showing in.ense s.aining in .he cardiac myocy.es; (B) as (A) excep. the pr-ma^r am.body wa 
pTad old Jth th immunogenic peptide. Al. for 2 h; (C) the san,e tissue as (A) excep. stained with the weaker s.am.ng an.tbody. AbA2. Apparem 
£ g m .he poles of ,he nuclei are deposits of .he brown lipoehrome pigment, lipofuscin. (D) the same tissue as (A-C) processed .n ,he o pr.mary 

an.ibody (E) a longitudinal section of normal myocardium from a hear, which contained an acute infarc. elsewhere (Pa..em #A4-99R) sta ned w.th AbA 
shoS imense staining corresponding to .he cross s.ria.ions; (F) s.aining of a,e same hear, .issue as (E) wi.h AbAl showmg tn.ense s.am,ng .n cross 
section. Photomicrographs (A-E) were taken at an original magnification of lOOx. 



Anti-corin antibodies detect corin in heart lysates 

We generated polyclonal antibodies to two different peptides 
derived from unique regions of the corin polypeptide 
sequence in order to investigate its expression and localization 
in the heart. The first was a unique region within the serine 
protease catalytic domain between the conserved Asp and Ser 



amino-acid residues (AbAl) and the second was contained 
within the scavenger receptor domain (AbA2). Immunoblot 
analysis of corin protein expression in human heart protein 
lysates showed a major immunoreactive band of 125-135 kDa 
(Fig. 2), which was not present in lysates from the negative 
control HeLa cell line. This molecular mass is slightly lower 
than that reported 150 kDa) for recombinant V5/His6 
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Fig. 4. Corin expression in neonate heart with TAPVR. Immunohistochemical staining of human neonate heart tissues was performed using the affinity 
purified anticorin peptide A I polyclonal antibody as the primary antibody (A) and (C) longitudinal sections of TAPVR heart tissue showing staining in the 
cardiac myocytes, corresponding to the cross strialions; (B) and (D) longitudinal sections of a normal neonate heart showing a similar staining pattern in the 
cardiac myocytes. Photomicrographs (A) and (B) were taken at an original magnification of lOOx and (C) and (D) were taken at an original magnification of 
40x. 



tagged corin expressed by human embryonic kidney 293 cells 
[20]. As the mature corin zymogen has a calculated mass of 
116 kDa [19], it is likely that the mature corin polypeptide 
undergoes a post-translational processing event, possibly 
glycosylation. Consistent with this, there are 19 predicted 
N-linked glycosylation sites present in the extracellular 
domains of corin [19]. 



Corin is expressed by human heart myocytes 

To investigate the localization of corin expression in human 
heart, immunohistochemical analyses were performed on 
human adult heart tissues. Corin was abundantly expressed 
in cardiac myocytes, with intense brown staining associated 
with cross striations seen in longitudinally sectioned myofibers 
(Fig. 3A). In some areas there was accentuation of the plasma 
membrane, consistent with an integral membrane localization 
of corin. This same pattern of staining was observed in sections 
taken from all areas of the myocardium. Control slides using 
the AbAl polyclonal antibody in the presence of competing 
Al peptide showed absence of this specific staining pattern 
(Fig, 3B). An identical, albeit weaker staining pattern was 
observed in experiments performed using the second corin- 
specific antibody (AbA2) (Fig. 3C). No staining was detected 
in the absence of antibody (Fig. 3D). Staining of a section of 



viable myocardium from a heart containing an acute myocar- 
dial infarct showed a similar intense staining of the striations 
in cardiac myocytes (Fig. 3E) and a pinhead-like dot pattern 
when viewed in cross section (Fig. 3F). Necrotic heart tissue 
showed similar but much less intense staining (data not shown). 
Corin was not detected in sections of skeletal or smooth muscle 
(data not shown), suggesting that the function of corin is 
specifically related to cardiac muscle. 



Corin protein expression in a patient with the congenital 
heart disease, TAPVR 

The molecular mechanisms responsible for the developmental 
defect associated with the rare congenital heart disease TAPVR 
are not known. The location of the corin gene on human 
chromosome 4pl2-13 [19] and the localization of the TAPVR 
locus to a 30 centimorgan interval on 4pl3-ql2 [26], suggested 
that corin may be a candidate for the TAPVR gene [19]. If corin 
plays a role in TAPVR, its expression may be lost or altered in 
TAPVR heart tissue. To explore this possibility, we examined 
corin protein expression in a TAPVR heart. The pattern of corin 
expression detected in this heart tissue (Fig. 4A,C) was similar 
to that observed in the adult heart and was identical to the 
pattern of corin staining in an age-matched neonate control 
heart (Fig. 4B,D). While this data is not consistent with a role 
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Fig. 5, Diagram showing domain structures of corin compared with other mosaic integral membrane proteins. The domains are as indicated. The 
catalytic serine protease residues are circled. The disulfide bond linking catalytic and pro-regions are marlced. 



for corin in TAPVR, it does not exclude the possibility that 
TAPVR is associated with more subtle alterations to the corin 
gene; for example point mutations, that would not be detected 
by this method. 

Corin homology to other type II transmembrane proteases 

As illustrated in Fig. 5, corin is a mosaic integral membrane 
protein possessing discrete domains. The intracellular, cyto- 
plasmic domain contains two potential protein kinase C phos- 
phorylation sites which may represent mechanisms for signal 
relay to or from the cell surface. Corin contains two frizzled 
domains. These domains function in other molecules as 
receptors for Wnt proteins, which are implicated in signal 
transduction during development [28]. Corin possesses eight 
LDL receptor domains which can mediate uptake of LDLs [29] 
and have also been shown to be involved in binding and 
internalization of protease/inhibitor complexes [30]. LDLs 
regulate the transport of cholesterol and play a major role in 
the development of heart disease. Corin possesses a scavenger 
receptor domain, which in other proteins, binds polyanionic 
molecules including modified lipoproteins, cell surface lipids 
and some sulfated polysaccharides [31]. The trypsin-like serine 
protease domain is located at the C-terminus. 

Corin bears similarity to other known members of the 
integral membrane serine proteases as illustrated in Fig. 5. The 
corin serine protease domain is highly homologous to a 
multidomain integral-membrane serine protease found in the 
brush border of the intestine, enteropeptidase [32]. Entero- 
peptidase functions to activate digestive pancreatic enzymes 
released from the intestine. Activation of this cascade is critical, 
as illustrated by the life-threatening intestinal malabsorption 
that accompanies congenital deficiency of enteropeptidase [32]. 
Other proteases with homology to the corin serine protease 
domain are the integral-membrane serine proteases, TMPRSS2 
and hepsin. Hepsin is a hepatic serine protease that has been 
demonstrated to activate Factor VII in the extrinsic blood 
coagulation pathway leading to thrombin formation, and has 
further been shown to be required for mammalian cell growth 
[33]. 

In summary, we have confirmed heart as a site of abundant 
corin mRNA expression and demonstrated for the first time the 
expression of corin as a 125-135 kDa protein in this tissue. In 



addition, in heart we have localized corin protein to myocytes; 
the same cardiac cells expressing pro-ANP. These data support 
recently reported in vitro evidence that the corin proteolytic 
domain is the pro-ANP convertase [20] and thus, the proposal 
that corin has a role in regulating blood pressure. Possible 
additional functions of the serine protease domain and the 
functions of the other corin domains are not yet known. The 
putative phosphorylation sites in the cytoplasmic domain of 
corin may indicate that the intracellular domain of corin will be 
a target for phosphorylation and therefore may mediate 
signalling events from the cell surface. A better understanding 
of the role of corin in heart will provide insight into basic 
molecular mechanisms of cardiac function and could provide a 
rational target for both diagnostic and therapeutic applications. 
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INHmrrOR RESISTANT SERINE 
PROTEASES 

The present inventioD relates to serine proteases of the 
chymotrypsin superfamily which have been modified so that 5 
they exhibit resistance to serine protease inhibitors. The 
invention also relates to the precursors of such compounds, 
their preparation, to nucleic acid coding far them and to their 
pharmaceutical use. 

Serine proteases are endopeptidases which use serine as lO 
the nudeophile in peptide bond cleavage. There are two 
known superfamilies of serine proteases and these are the 
chymotiypsin superfamily and the Streptomyces subtilisin 
superfamily (Barrett, A. J., in: Proteinase Inhibitors, Ed, 
Barrett, A. J. et al., Elsevier, Amsterdam, pp 3-22 (1986) and 15 
James, M. N. G., in: Proteolysis and Physiological 
Regulation, Ed. Ribbons, D. W. et al, Academic Rress, New 
York, pp 125-142 (1976)). 

The present invention is particularly concerned with 
serine proteases of the chymotrypsin superfamily which 20 
includes such compounds as plasmin, tissue plasminogen 
activator (t-PA), urokinase-type plasminogen activator 
(u-PA), trypsin, diymotrypsin, granzyme, elastase, acrosin, 
tonin, myeloblastin, prostate-specific antigen (PSA), 
gamma-renin, tryptase, snake venom serine proteases, 25 
adipsin, protein C, cathepsin G, complement components 
CIR, CIS and C2, complement factors B, D and I, chymase, 
hepsin, meduUasin and proteins of the blood coagulation 
cascade including kallikrein, thrombin, and Factors Vila, 
DCa, Xa, XIa and Xlla. Members of the chymotrypsin 30 
superfamily have amino acid and structural homology of the 
catalytic domains, although a comparison of the sequences 
of the catalytic domains reveals the presence of insertions or 
deletions of amino acids. However, these insertions and 
deletions map to the surface of the folded molecule and thus 35 
do not a£Fect the basic structure although it is likely that they 
contribute to the specificity of interactions of the molecule 
with substrates and inhibitcrs (Strassburger, W. et al, FESS 
Utt^ 157, 219-223 (1983)). 

Serine protease inhibitors are also well known and are 40 
divided into the following families: the bovine pancreatic 
trypsin inhibitor (BPTI) family, the Kazal family, the alpha- 
2-macroglobulin (A2M) family, the Streptomyces subtilisin 
inhibitor (SSI) family, the serpin family, the Kunitz family, 
the four-disulphide core family, the potato inhibitor family 45 
and the Bowman-Birk family. 

Serine protease inhibitors inhibit their cognate serine 
proteases and form stable 1:1 complexes with these pro- 
teases. Structural data are available for several protease- 
inhibitor complexes including trypsin-BPTI, chymotrypsin- so 
ovomucoid inhibitor and chymotrypsin-potato inhibitor 
(Read, R. J. et al., in: Proteinase inhibitors, Ed, Barrett, A. 
J. et al., Elsevier, Amsterdam, pp 301-336 (1986)). A 
structural feature which is common to all the serine protease 
inhibitors is a loop extending from the surface of the 55 
molecule which contains the recognition sequence for the 
active site of the cognate serine protease and, in fact there 
is remarkable similarity in the specific interactions between 
different inhibitors and their cognate serine proteases, 
despite the diverse sequences of the inhibitors. 60 

The serine proteases of the chymotrypsin superfamily 
play an important role in human and animal physiology. 
Some of the most important serine protease inhibitors are 
those which are involved in blood coagulation and fibrin- 
olysis. In the process of blood coagulation, a cascade of 65 
enzyme activities is involved in generating a fibrin network 
which forms the framework of a clot or thrombus. Degra- 



dation of the fibrin network (fibrinolysis) involves the pro- 
tease inhibitor plasmin. Plasmin is formed in the body from 
its inactive precursor plasminogen by cleavage of the pep- 
tide bond between arginine 561 and valine 562 of plasmi- 
nogen. This reaction is catalysed by t-PA or by u-PA. 

If the balance between the clotting and fibrinolytic sys- 
tems becomes locally disturbed, intravascular clots may 
form at inappropriate locations leading to conditions such as 
coronary thrombosis and myocardial infarction, deed vein 
thrombosis, stroke, peripheral arterial occlusion and embo- 
lism A known way of treating such conditions is to admin- 
ister to a patient a serine protease of the chymotrypsin 
superfamily or the precursor of such an enzyme. For 
example, t-PA, u-PA and plasminogen in the form of anisoy- 
lated plasminogen conplexed with streptokinase are used in 
the treatment of myocardial infarction; plasminogen is used 
to supplement the natural circulatory plasminogen level to 
enhance thrombolytic therapy; and protein C is used as an 
antithrombotic agent Serine proteases of the chymotrypsin 
superfamily, for exanq>le factors Vila and DC, are adminis- 
tered for induction of blood clotting in disorders such as 
haemophilia. A major problem with the use of all of these 
agents in this type of therapy is their rapid neutralisation by 
serine protease inhibitors which reduces the efi&dency of the 
therapy and increases the dose of agent required. It would 
therefore be advantageous to develop modified analogues of 
these endopeptidases which are resistant to inactivation by 
serine protease inhibitors whilst maintaining their activity. 
However, it is not easy to predict modifications which will 
result in increased resistance to inhibition without significant 
decrease in endopeptidase activity. 

WO- A-90 10649 discloses serine proteases of the chy- 
motrypsin superfamily which have been modified and which 
are said to have increased resistance to serine protease 
inhibitors. The authors of that document have studied the 
known structure of the complex between trypsin and BFFI 
and have realised that other than the amino adds in the 
majcH- recognition site, tiie amino adds of trypsin that make 
direct contact with BPTI are located in the region between 
residues 37 and 41 and in the region between residues 210 
to 213 of the polypq)tide chain. The authors have then 
extrs^lated from this on the basis that there is a high degree 
of stroctural homology between the catalytic domains of 
serine proteases and have suggested that mutation of a 
residue in any serine protease equivalent to the 'iyr-39 
residue in trypsin would lead to increased resistance of the 
modified analogue compared with the wild-type serine pro- 
tease. They also suggest that inhibition resistant t-PA ana- 
logues can be made by mutation of an additional stretch of 
seven amino adds which occurs in tPA, but not in trypsin, 
adjacent to the predicted contact point at Arg-304 
(equivalent to Tyr-39 of trypsin). However, although the 
catalytic domains of members of the chymotrypsin super- 
family of serine proteases do, in general, have sequence and 
structural homology, Tyr-39 of trypsin is on a loop structure 
on the surface of the protein and, as is shown in FIG. 1, the 
equvalent regions of other serine proteases are highly 
variable within the superfamily. Indeed, this is acknowl- 
edged in WO-A-9010649. It is, therefore, by no means 
evident that the specific conformation of the loop in tiiis 
region of the protein is conserved between different serine 
proteases, especially in cases where the number of residues 
in the loop differ, as is the case for trypsin and plasmin. 
Thus, although the residues in the region may be aligned 
sequentially because of the alignment of their flanking 
regions which do have similar sequences, it is not at all 
evident that their side-chains are in equivalent spadal loca- 
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tions and, therefore, residues which are equivalent in a 
sequence alignment are not necessarily able to form equiva- 
lent interactions in the folded protein. If plasmin is taken as 
an example, it can be seen from FIG. 1 that there are three 
hydrophobic residues (Phe-22, Met-24 and Phe-26) which 5 
could be involved in a similar hydrophobic interaction to 
that of Tyr-39 in the trypsin/BFTI complex. The numbering 
of the plasmin residues just mentioned is the numbering of 
SEQ ID No 2 which depicts the protease domain of plasmin. 
The residue designated 1 in SEQ ID No 2 is at position 562 
of the mature protein. A study of FIG. 1 shows that any of 
these residues could be equivient to TyT-39 of trypsin which 
occurs at position 29 in the numbering system of FIG. 1. 
Clearly, therefore, the method described in WO-A-9010649 
for designing a protease which is resistant to inhibition is not 
wholly reliable and it would be preferable to design inhibi- 
tlon resistant mutants in a different way. 

The present inventors have realised that, because the 
serine protease inhibitors are structurally homologous in 
their active centre loop and form similar interactions with 
their cognate serine proteases (Read, R. J. et al., in: Pro- 20 
teinase Inhibitors, Ed. Barrett, A. J. et al., Elsevier, 
Amsterdam, pp 301-336 (1986)).. mutations in any given 
serine protease which result in resistance to inhibition by a 
serine protease inhibitor may be applicable to mutations of 
spatially or sequentially equivalent residues in any other 25 
member of the chymotrypsin superfamily. 

The interaction between enzyme and inhibitor respon- 
sible for inhibition of enzyme activity involves the catalytic 
site amino acids of the enzyme and the reactive site amino 
acids of the inhibitor. Tliis principal interaction is stabilised 
by other interactions between the molecules. Although there 
is a comparatively large surface of interaction between the 
protease and the inhibitor, the protease/inhibitor complex is 
mainly stabilised by a few key interactions. These are 
exemplified by the interactions observed in the protease/ 
inhibitor complex between trypsin and BFTI (Huber, R. et 
al., J. MoL Biol 89:73-101 (1974)), which serves as a model 
for the interaction between the catalytic domains of other 
serine proteases and their cognate inhibitors. In the trypsin/ 
BPn complex, the key residues of the protease, apart from 
those in the principal recognition site, which interact with 40 
the inhibitor are residues 37-41 and 210-213 (chymotrypsin 
numbering), with 1Vr-39 being the most important. This 
interaction served as the basis for WO-A-9010649 in which 
the spatially equivalent residues in the t-PA/PAI-l complex 
were identified, and inhibitor-resistant mutants were 45 
described. 

In contrast to the disclosure WO-A-9010649, the present 
inventors have realised that the desired disruption of the 
protease/inhibitor interactions which lead to inhibitor resis- 
tance need not be caused by mutating the specific residues so 
identified in that document or their equivalents in other 
serine proteases. Instead, residues in spadal, rather than 
sequential, proximity to these key residues, may be mutated 
resulting in a less stable complex between the protease and 
the inhibitor. 55 

In a first aspect of the present invention, there is provided 
a modified endopeptidase of the chymotrypsin superfamily 
of serine proteases or a precursor of such an endopeptidase, 
which is resistant to serine protease inhibitors, characterised 
in that the modification comprises the mutation of one or 60 
more residues in dose spacial proximity (other than sequen- 
tial proximity) to a site of interaction between the protease 
and a cognate protease inhibitor. 

In the context of this invention, the term 'precursor', 
when used in relation to a serine protease, refers to a protein 65 
which is cleavable by an enzyme to produce an active serine 
protease. 



Mutations resulting in resistance to the inhibitor may 
induce: 

i) a conformational change in the local fold of the protease 
sudi that the resulting complex with the inhibitor is less 
stable than the equivalent complex between the inhibi- 
tor and the wild- type protein; 

ii) a change in the relative orientations of the protease and 
inhibitor on forming a complex such that the resulting 
complex is less stable than the equivalent complex 
between the inhibitor and the wild-type protein; 

iii) a change in the stenc bulk of the protease in the region 
of the inhibitor-binding site such that the resulting 
complex is less stable than the equivalent complex 
between the inhibitor and the wild-type protein; 

iv) a change in the electrostatic potential field in the 
region of the inhibtor-binding site such that the result- 
ing complex is less stable than the equivalent complex 
between the inhibitor and the wild-type protein; or 

v) any combination of the above. 

Tlie residues to be mutated need not be sequentially close 
to the key residues involved in the protease/inhibitor 
interaction, since the three-dimensional folding of the pro- 
tease chain brings sequentially distant residues into spatial 
proximity. It is necessary to select the residues for mutation 
based on a model of either the protease used to generate the 
mutant, or of another member of the chymotrypsin super- 
faniily of serine proteases. Where the three-dimensional 
structure of the protease to be mutated is not known, the 
selection of residues for mutation may be based either on a 
three-dimensional model of the protein to be mutated 
derived using homology modelling or other techniques, or 
on sequence alignments between the protein to be mutated 
and other members of the chymotrypsin superfamily of 
serine proteases with known three-dimensional structures. If 
sequence alignments are employed, it is not necessary to 
generate a three-dimensional structural model of the pro- 
tease of interest in order to select residues for mutation to 
give inhibitor resistance, as spatial proximity to the key 
residues can be inferred from those proteins in the alignment 
with known three-dimensional structures. The spatial rela- 
tionships between the residues to be mutated and the key 
residues in the protease/inhibitor interaction may be inferred 
by any appropriate method. Suitable methods are known to 
those skUled in the art. 

The modified serine protease may be any serine protease 
of the chymotrypsin superfamily since all of these enzymes 
have a common mechanism of action. Examples of serine 
protease inhibitors which can be modified according to the 
present invention are as follows: 

plasmin, tissue plasminogen activator (t-PA), urokinase- 
type plasminogen activator (u-PA), trypsin, chymotrypsin, 
granzyme, elastase, acrosin, tonin, myeloblastin, prostate- 
specific antigen (PSA), gamma-renin, tryptase, snake venom 
serine proteases, adipsin, protein C, cathepsin G, comple- 
ment components CIR, CIS and C2, con^lement factors B, 
D and I, chymase, hepsin, medullasin and proteins of the 
blood coagulation cascade including kaUikrein, thrombin, 
and Factors VHa, DCa, Xa, XIa and Xlla. 

However, modified analogues of plasmin, t-PA, u-PA, 
activated protein C, thrombin, factor Vila, factor DCa, factor 
Xa, factor XIa and factor XHa are particularly useful, as is 
a modified version of plasminogen, since all of these com- 
pounds can be used as fibrinolytic or thrombotic agents. An 
inhibition resistant plasmin analogue is particularly pre- 
ferred. 

The serine protease inhibitor to which the modified serine 
protease of the invention is resistant will obviously depend 
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on which serine protease has been modified. In the case of 
plasmin, the primary physiological inhibitor is 
a2-antiplasmin whidi belongs to the serpin family of serine 
protease inhibitors. The reaction between plasmin and 
a2-antiplasmin consists of two steps: a very fast reversible s 
reaction between the kringle 1 lysine binding site of plasmin 
and the caiboxy-texminal region of the inhibitor, followed by 
a reaction between the catalytic site of plasmin and the 
reactive site of the inhibitor which results in the formation 
of a very stable 1:1 stoichiometric enzymatically inactive lO 
complex (Holmes, W. E. ct aL, /. BioL Chenu, 262, 
1659-1664 (1987)). Therefore, when the serine protease is 
plasmin, it is particularly useful if the serine protease 
inhibitor to which the plasmin is resistant is a2-antiplasmia. 
Plasmin is also inhibited by a2 -macroglobulin and \$ 
al-antitrypsin and resistance to inhibition by these inhibi- 
tors is also useful. 

From a three-dimensional model of the plasmin/ 
antiplasmin complex, (described in Method 1), it has been 
determined that, in plasmin, ^e residues which are in close 20 
spatial proximity to the key residues of interaction between 
the protease and the inhibitor are residues 17-20, 44-54, 62, 
154, 158, 198-213. The numbering used above is the 
numbering system of sequence ID No 2 which represents tiie 
protease domain of plasmin and begins at position 562 of the 25 
mature protein. In order to be resistant to inhibition by a 
serine protease inhibitor such as ((x2-antiplasmin, it is nec- 
essary to modify plasmin in one or more of these regions. 
I^otease inhibition resistance can be induced in other serine 
proteases of the chymotrypsin supeifamily by modifying 30 
equivalent regions of these proteins. HG. 1 shows the 
sequences of the protease domains of a variety of proteases 
and, from a study of FIG. 1, it is dear where modifications 
should be made in OTder to induce resistance to protease 
inhibitors. In the numbering system of FIG. 1, the modifi- 35 
cation regions just mentioned occur at residues 17-22, 
49-64, 72, 203, 214, and 264-281. The types of mutations 
whidi are suitable for inducing resistance to inhibition 
include single or multiple amino add substitutions, addi- 
tions or deletions. However, amino add substitutions are 40 
particularly preferred. 

In plasmin, examples of amino add substitution muta- 
tions whidi result in a modl&ed response to inhibition by 
a2-antiplasniin, using the numbering system of SEQ ID No 
2, are Glu-62 to Lys or Ala, Ser-17 to Leu, Arg-19 to Glu or 45 
Ala, and Glu-45 to Lys, Arg or Ala. Resistance to protease 
inhibition can be induced in other serine proteases by 
making modifications at equivalent positions. The degree of 
resistance to inhibition may be altered by making either 
single or multiple mutations in the protease, or by altering 50 
the nature of the amino add used for substitution. 

In addition to the modification of the invention, the serine 
protease may be modified in other ways as coir^ared to 
wOd-type proteins. Any modifications may be made to the 
protein provided that it does not lose its activity. 55 

As an alternative to a modified serine protease, it is also 
possible to modify a precursor of the enzyme so that the 
enzyme derived from the precursor will have the desired 
resistance to inhibition. An example of a serine protease 
precursor is plasminogen which is the inactive precursor of 60 
plasmin. Conversion of plasminogen to plasmin is accom- 
plished by cleavage of the peptide bond between arginine 
561 and valine 562 of plasminogen. Under physiological 
conditions this deavage is catalysed by t-PA or u-PA. 
Qeavage of a modified plasminogen variant of the present 65 
invention will produce a plasmin variant as descried above 
and it is, of course, preferable that the plasminogen variant 



will be deaved to produce one of the preferred plasmin 
variants described above. 

Again, as with serine proteases, the precursors may have 
other modifications. Analysis of the wild-type plasminogen 
molecule has revealed that it is a glycoprotein composed of 
a serine protease domain, five kringle domains and an 
N-terminal sequence of 78 amino adds which may be 
removed by plasmin cleavage. Cleavage by plasmin 
involves hydrolysis of the Arg(68)-Met(69), Lys(77)-Lys 
(78) or Lys(78)-Val(79) bonds to create forms of plasmino- 
gen with an N-terminal methionine, lysine or valine residue, 
all of which are commonly designated as lys-plasminogen. 
Intact plasminogen is referred to as glu-plasminogen 
because it has an N-terminal glutamic add residue. Glyco- 
sylation occurs on residues Asn(289) and Thr(346) but the 
extent and composition are variable, leading to the presence 
of a number of different molecular weight forms of plasmi- 
nogen in the plasma. Any of the above plasminogen variants 
may be modified to produce a variant according to the 
present invention. The protein sequencing studies of 
Sottrap-Jensen et al (in: Atlas of Protein Sequence and 
Structure (Dayhoff, M. O., ed.) 5 suppl. 3, p.95 (1978)) 
indicated that plasminogen was a 790 amino add protein 
and that the site of deavage was the Aig(560)-Val(561) 
peptide bond. A plasminogen variant which is suitable for 
modification according to the present invention is a 791 
residue protein with an extra lie at position 65 and encoded 
by cDNA isolated by Forsgren et al (FEBS Letters, 213, 
254-260 (1987)). The serine protease domain of any of these 
plasminogen analogues can be recognised by its homology 
with serine proteases and on activation to plasmin is the 
catalytically active domain involved in fibrin degradation. 
The five kcingle domains are homologous to those in other 
plasma proteins such as tPA and prothrombin and are 
involved in fibrin binding and thus localisation of plasmi- 
nogen and plasmin to thrombi. 

The plasminogen analogues of the present invention may 
also contain other modifications (as compared to wild-type 
glu-plasminogen) which may be one or more additions, 
deletions or substitutions. Examples of particularly suitable 
plasminogen analogues are disclosed in our copending 
applications WO-A-9109118 and GB 9222758.6 and com- 
prise plasminogen analogues which are cleavable by an 
enzyme involved in blood clotting no produce active plas- 
min. These plasminogen analogues may, according to the 
present invention, be further modified so that, on cleavage, 
the plasmin whidi is produced is resistant to inhibition by 
serine protease inhibitors such as cc2-ant^>lasmin. Other 
plasminogen analogues which xaay be modified to produce 
the plasminogen analogues of the invention are analogues in 
which there has been an addition, removal, substitution or 
alteration of one or more kringle domains. Other suitable 
plasminogen analogues are Lys-plasminogen variants in 
which the amino terminal 68, 77 or 78 amino adds have 
been ddeted. Such variants may have enhanced fibrin bind- 
ing activity as has been observed for lys-plasminogen com- 
pared to wild-type glu-plasminogen (Bok, R. A. and Mangel, 
W. F., Biochemistry, 24, 3279-3286 (1985)), Also included 
within the scope of the invention are plurally-modified 
plasminogen analogues which indude one or more modifi- 
cations to prevent, reduce or alter glycosylation patterns. 
Such analogues may have a longer half-life, reduced plasma 
clearance and/or higher specific activity. 

The modified serine proteases and serine protease precur- 
sors of the invention can be prepared by any suitable method 
and, in a second aspect of the invention, there is provided a 
process for the preparation of such a serine protease or serine 
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protease precursor, the process comprising coupling 
together successive amino acid residues and/or ligating 
oligopeptides. Although the proteins may, in principle, be 
synthesised whoUy or partly by chemical means, it is pre- 
ferred to prepare them by ribosomal translation, preferably 5 
in vivo, of a corresponding nucleic acid sequence. The 
process may further include an appropriate glycosylation 
step. 

It is preferred to produce proteins of the invention using 
recombinant DNA technology. DNA encoding a naturally 
occurring serine protease or precursor may be obtained from 
a cDNA or genomic clone or may be synthesised. Amino 
acid substitutions, additions or deletions are preferably 
introduced by site-specific mutagenesis. DNA sequences 
encoding glu-plasminogen, lys-plasminogen, other plasmi- 
nogen analogues and serine protease variants may be 
obtained by procedures familiar to those sldlled in the art of 
genetic engineering. 

The process for producing proteins using recombinant 
DNA technology will usually include the steps of inserting 
a suitable coding sequence into an expression vector and 20 
transf ecting the vector into a suitable host cell. Therefore, in 
a third aspect of the invention there is provided nucleic acid 
coding for a modified serine protease as described above. 
The nucleic acid may be either DNA or RNA and may be in 
the form of a vector such as a plasmid, cosmid or phage. The 25 
vector may be adapted to transfect or transform prokaryotic 
cells, such as bacterial cells and/or eukaryotic cells, such as 
yeast or mammalian cells. The vector may be a cloning 
vector or an expression vector and conq>rises a cloning site 
and. preferably, at least one marker gene. An expression 30 
vector will additionally have a promoter operatively linked 
to the sequence to be inserted into the cloning in site and, 
preferably, a sequence enabling the protein product to be 
secreted. 

Most of the proteins of the present invention, including 35 
molecules such as tPA, can easily be obtained by inserting 
the coding sequence into an expression vector as described 
and transfecting the vector into a suitable host cell which 
may be a bacterium such as E. coli, a eukaryotic microor- 
ganism such as yeast ot a higher eukaryotic cell. With 40 
molecules such as plasminogen which are unusually difficult 
to express, it may be necessary to use a vector of the type 
described in our copending application, WO-A-9109118, 
which comprises a first nucleic acid sequence coding for the 
moditied serine protease, operatively linked to a second 45 
nucleic acid sequence containing a strong promoter and 
enhancer sequence derived from human cytomegalovirus, a 
third nucleic add sequence encoding a polyadenylation 
sequence derived from SV40 and a fourth nucleic acid 
sequence coding for a selectable marker expressed from an 50 
SV40 promoter and having an additional SV40 polyadeny- 
lation signal at the 3' end of the selectable marker sequence. 
Such a vector may either comprise a single nucleic acid 
molecule or a plurality of such molecules so that, for 
example, the first, second and third sequences may be 55 
contained in a first nucleic acid molecule and the fourth 
sequence may be contained in a second nucleic acid mol- 
ecule. This vector is particularly useful for the expression of 
plasminogen and plasminogen analogues. 

For any of the proteins of the invention, the vector is 60 
preferably chosen so that the protein is expressed and 
secreted into the cell culture medium in a biologically active 
form without the need for any additional biological or 
chemical procedures. In the case of plasminogen, this can be 
achieved using the vector described above. 65 

In a further aspect of the invention there is provided a 
process for the preparation of nucleic add encoding a 



modified serine protease which exhibits resistance to serine 
protease inhibitors, the process comprising coupling 
together successive nudeotides and/or ligating oligo- and/or 
poly-nudeotides. 

In a further aspect of the invention, there is provided a cell 
transformed or transfec^ed by a vector as described above. 
Suitable cells or ceU lines include both prokaryotic and 
eukaryotic cells. A typical example of a eukaryotic ceU is a 
bacterial cell such as E. colL Suitable eukaryotic cells 
indude yeast cells such as Sacchrcmyces cerevisiae or 
Pichia pastoris. Other examples of suitable eukaryotic cells 
are mammalian cells which grow in continuous culture and 
examples of such cells include Chinese hamster ovary 
(CHO) cells, mouse myeloma ceU lines such as P3X63- 
Ag8.653 and NSO, COS cells, HeLa ceUs, 293 cells, BHK 
cells, melanoma cell lines such as the Bowes cell Une, mouse 
L cells, human hepatoma ceU lines such as HepG2, mouse 
fibroblasts and mouse NIH 3T3 cells. CHO cells are par- 
ticularly suitable as hosts for the expression of plasminogen 
and plasminogen analogues. The transformation of the cells 
may be achieved by any convenient method but electropo- 
ration is a particularly suitable method. 

For some molecules, such as plasminogen, there may be 
a low level of undesirable activation during culture. 
Therefore, in a further aspect of the invention, there is 
provided a eukaryotic host cell transf ected or transformed 
with a first DNA sequence encoding a serpin-resistant serine 
protease and with an additional DNA sequence encoding the 
cognate inhibitor. 

The modified serine proteases of the present invention 
have a variety of uses and, if the serine protease is a 
fibrinolj^c or thrombolytic enzyme, it will be useful in a 
method for the treatment and/or prophylaxis of diseases or 
conditions caused by blood clotting, the method comprising 
administering to a patient an effective amount of the serine 
protease. 

Therefore, in a fiirther aspect of the invention, there is 
provided a modified serine protease according to the first 
aspect of the invention, which is a serine protease having 
fibrinolytic, thrombolytic, antithrombotic or prothrombotic 
properties, for use in medidne, particularly in the treatment 
of diseases mediated by blood clotting. Such conditions 
indude myocardial and cerebral infarction, arterial and 
venous thrombosis, thromboembolism, post-surgical 
adhesions, thrombophlebitis and diabetic vasculopathies. 

The invention also provides the use of a modified 
fibrinolytic, thrombolytic, antithrombotic or prothrombotic 
serine protease according to the first aspect of the invention 
in the preparation of an agent for the treatment and/or 
prophylaxis of diseases or conditions mediated by blood 
clotting. Exansples of such conditions are mentioned above. 

Furthermore, there is also provided a pharmaceutical or 
veterinary composition comprising one or more modified 
serine proteases of the first aspect of the invention together 
with a pharmaceutically and/or veterinarily acceptable car- 
rier. 

The composition may be adapted for administration by 
oral, topical or parenteral routes including intravenous or 
intramuscular injection or infusion. Suitable injectable com- 
positions may comprise a preparation of the compound in 
isotonic physiological saline and/or buffer and may also 
indude a local anaesthetic to alleviate the pain of the 
injection. Similar con:^sitors may be used for infusions. If 
the compound is administered topically, it may be formu- 
lated as a cream, ointment or lotion in a suitable base. 

The compounds of the invention may be supplied in unit 
dosage form, for example as a dry powder or water-free 
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concentrate in a hennetically sealed container such as an 
ampoule or sachet 

The quantity of material to be administered will depend 
on the amount of fibrinolysis or inhibition of clotting 
required, the required speed of action, the seriousness of the 5 
thromboembolic position and the size of the clot Hie 
precise dose to be administered will, because of the very 
nature of the condition which conq>ounds of the invention 
are intended to treat, be determined by the physician. As a 
guideline, however, a patient being treated for a mature 
thrombus will generally receive a daily dose of a plasmino- 
gen analogue of from 0.01 to 10 mg/kg of body weight either 
by injection in for example up to 5 doses or by infusion. 

The invention will now be further described by way of 
example only with reference to the following drawings in 
which: 

FIG. 1 shows the alignment of the catalytic domain amino 
acids of the chymotcypsin superfamily; 

FIGS. 2a and 2b shows maps of the pGWH and pGWHgP 
vectors; 

FIG. 3 shows the effect of Ge2-antLplasmin on the activity 
of plasminogen mutant A3. 

FIG. 4 shows the sequence alignment of ovalbumin and 
a2-antiplasmin used to generate the oc2-antipla5min model. 

The following examples further illustrate the invention. ^ 

Exaiiq)les 1 to 5 describe the expression of various 
plasminogen analogues from higher eukaiyotic cells and 
example 6 describes an assay used to assess resistance to 
a2-antiplasmin. 



EXAMPLE 1 
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Construction and Expression of Al and A12 

The isolation of plasminogen cDNA and construction of 
the vectors pGWH and pGWHgP (FIG. 2) have been 
described in WO-A-9109118. In pGWHgP, transcription 35 
through the plasminogen cDNA can initiate at the HCMV 
promoter/enhancer and the selectable marker gpt is 
employed. 

The techniques of genetic manipulation, expression and 
protein purification used in the manufacture of the modified 40 
plasminogen examples to follow, are well known to those 
skilled in the art of genetic engineering. A description of 
most of the tediniques can be found in one of the following 
laboratory manuals: "Molecular Qoning" by T. Maniatis, E 
F. Fritsdi and J. Sambrook published by Cold Spring Harbor 45 
Laboratory, Box 100, New York, or '*Basic Methods in 
Molecular Biology" by L. G. Davis, M. D. Dibner and J. F. 
Battey published by Elsevier Science publishing Co Inc, 
New York. 

Additional and modified methodologies are detailed in the so 
methods section below. 

Plasminogen analogues have been constructed which are 
designed to be resistant to inhibition by Gc2-antiplasmin. Al 
is a plasminogen analogue in which the amino acid Phe-587 
is replaced by Asn. A12 is a plasminogen analogue in which 55 
the Arg-580 is replaced by Glu. The modification strategy in 
this exanq7le is essentially as described in WO-A-9109118 
Example 3, with the mutagenesis reaction carried out on the 
1.87 kb I^nl to HincII fragment of the thrombin activatable 
plasminogen analogue T19 cloned into the bacteriophage 60 
M13mpl8. Single stranded teiiq>late was prepared and the 
mutation made by oligonucleotide directed mutagenesis. For 
Al, a 24 base long oligonucleotide S'GGTGCCTCCA- 
CAATTGTGCAITCCS* (SEQ. ID. 3) was used to direct the 
mutagenesis and for A12 a 27 base oligonucleotide was used 65 
S CCAAACCTTGnTCAAGACTGACITGC 3* (SEQ ID 
7). 



Plasmid DNA was introduced into CHO cells by elec- 
troporation using 800 V and 25 pF as described in the 
methods section below. Selective medium (250 pl/ml 
xanthine, 5 pg/ml mycophenolic acid, Ix hypoxanthine- 
thymidine (HT)) was added to the cells 24 hours post 
transf ection and the media changed every two to three days. 
Plates yielding gpt-resistant colonies were screened for 
plasminogen production using an ELISA assay. Oils pro- 
ducing the highest levels of antigen were re-cloned and the 
best producers scaled up into fiasks with production being 
carefully monitored. Frozen stocks of all these cell lines 
were laid down. Producer ceUs were scaled up into roller 
bottles to provide conditioned medium from which plasmi- 
nogen jH-otein was purified using lysine SEPHAROSE 4B. 
(The wcrd SEPHAROSE is a trade mark.) 

EXAMPLE 2 

Construction and Expression of A3 and A 16 

The procedure of Example 1 was generally followed 
except that the mutagenesis was performed on an EcoRV to 
Hindm fragment (0.85 kb) containing the 3* of wild type 
plasminogen cloned into M13. The oligonucleotide used 
was a 27mer S'GrTCGAGArTCACTTTTTGGTCjTG- 
CAC3' (SEQ. ID. 4) which changed Glu-623 to Lys, thus 
changing an acidic amino acid to a basic amino add. The 
resulting mutant was cloned as an EcoRV to Sphl fragment 
replacing the corresponding wild type sequence. The 27 base 
oUgonucleotide 5*(jTTCGAGArTCACrG(jrTGGT(jTG- 
CAC3' (SEQ ID 10) was used to diange Glu-623 to Ala to 
produce A16. 

EXAMPLE3 
Construction and Expression of A4, A14 and A15 

Mutant A4 is designed to disnipt ionic interactions on the 
surface of plasminogen preventing binding to antiplasmin. 
The mutagenesis and sub-cloning strategy was as described 
in Example 1 using a 24 base oligonucleotide S'CTTGGG- 
GACrTCITCAAGClAC3TGG3* (SEQ. ID. 5) designed to 
convert Glu-606 to Lys. The 24 base oligonucleotide 
5'CITGGGGACrTGGCrAGACA(3TGG 3' (SEQ ID 8) 
was used to change Glu-606 to Ala to produce A14 and the 
25 base oligonucleotide 5*CrrGGGGACITCCTrAGA- 
C:AGTGGG 3' (SEQ ID 9) was used to change Glu-606 to 
Arg to produce A15. 

EXAMPLE 4 

Construction and Expression of A5 

Plasminogen analogue A5 was designed to alter the 
positioning of the lyr 39 containing structural loop and was 
made generally as described in the procedure of Example 1. 
In A5, Ser-578 has been replaced by Leu using the 24mer 
5'CrCGTACGAAGC:AGGACrrGCCAG3* (SEQ. ID. 6) 
on the Kpnl to EcoRV fr^agment of plasminogen in M 13 as 
the template. The mutation was cloned directly into 
pGWlHg.plasminogen using the restriction enzymes Hin- 
dm and SplL These sites had previously been introduced at 
the extreme 5' end of plasminogen and at 1850 respectively 
via mutagenesis; the plasminogen coding sequence was not 
affected by this procedure. 

EXAMPLE 5 

Construction and Expression of double mutant 

A3A4 

Plasminogen mutant A3A4 combines the two mutations 
A3 and A4 as described in Examples 2 and 3 respectively. 
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Mutagenesis was performed on the EcoRV to SphI fragment 
of A4 cloned into M13 using the A3 mutagenesis oligo- 
nucleotide (SEQ ID4). 

EXAMPLE 6 

Plasmin-Antiplasmin Interaction Assays 

A diromogenic assay was used to assess the resistance of 
the plasmin(ogen) mutants to inhibition by a2-antiplasmin. 
Inhibition of plasmin activity was determined by the change 
in the rate of cleavage of the plasmin chromogenic substrate 
S2251 (Quadratech, P.O. Box 167, Epsom, Surrey. KT17 
2SB). 

Prior to assay* the plasminogens were activated to plasmin 
using either urokinase for mutants in wild type plasminogen, 
or thrombin for thrombin activatable plasminogen mutants 
(WO-A-9109118). Activation of wild-type plasminogen to 
plasmin was achieved by incubation of the plasnxinogen (ca. 
14 pg) with urokinase (16.8x10"^ U) in 1750 pi of assay 
buffer (50 mM TOs, 0. 1 mM EDTA, 0.00005% Triton XlOO, 
0.1% (wA^) human serum albumin, pH 8.0) at 37** C. for 5 
mins. Activation of thrombin activatable plasminogen 
mutants to plasmin was achieved by incubation of the 
plasminogen (ca. 14 pg) with thrombin in 1750 pi of assay 
buffer at 37° C. Hirudin was added to inhibit the thrombin 
activity as thrombin cleaves the chromogenic substrate. 

Plasmin (125 \d) was mixed with 250 pi S2251 (2 mg/ml 
in assay buffer) and 125 pi antiplasmin (1.25 pg in assay 
buffer, #4032 American Diagnostica Inc., 222 Railroad 
Avenue, P.O. Box 1165, Greennwich, Conn. 06836-1165) or 
125 pi assay buffer in a cuvette and the absorbance at 405 
nM measured over time. 

A Beckman DU64 spectrophotometer and Beckman 
"Data Leader" data capture software were used to record 
absorbance at 405 nM at 1 sec intervals for 8 minutes. The 
Data Leader software package was used to calculate the first 
derivative of the data to provide the rate of change of 
absorbance at 405rim against time, an estimate of active 
plasmin concentration against time. Wild type plasmin was 
rapidly inactivated by cx2-antiplasinin; after only 15 seconds 
the plasmin was essentially inactivated. In contrast, plasmi- 
nogen mutant A3 has an antiplasmin resistant phenotype and 
is only slowly inactivated by antiplasmin with a x¥i (half the 
rate of OD change at t^l5 sec) of approximately 75 seconds 
(FIG. 3). 

METHODS 

1. Model structures were built by homology based on the 
x-ray structures of trypsin/BFIX A refined plasminogen 
structure was modelled by homology to thrombin using the 
PPACK/thrombin x-ray structure from Bode et al. (Bode, W. 
et al., EMBO J. 8:3467-3475 (1989). A refined alpha-2- 
antiplasmin [A2AP] structure was modelled by homology to 
ovalbumin using atomic co-ordinates from the Brookhaven 
Protein Data Bank entry lOVA, except for the loop con- 
taining the reactive bond, which was modelled using the 
co-ordinates for residues 13 to 19 of BFTT from the PDB 
entry 2FTC. The alignment used to generate the A2AP 
model is shown in FIG. 4. The A2AP model described here 
does not include co-ordinates for the 79 N-terminal residues 
and 55 C-terminal residues. 

Most serine-protease-directed inhibitors react with cog- 
nate enzymes according to a common, substrate-like stan- 
dard mechanism (Bode, W. and Huber, R., Eun J. Biochenu 
204:433-451 (1992). In particular, they all possess an 
exposed active site-binding loop with a characteristic 
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canonical conformation. The binding loop on the A2AP 
model was therefore modelled on the equivalent loop of 
BFFI (residues 13 to 19), using atomic co-ordinates from the 
PDB entry 2PTC (in which BFTI is complexed with 
5 trypsin). 

The complex of A2AP and the plasmin serine protease 
domain was modelled using the trypsin/BFII complex struc- 
ture from PDB entry 2ITC. The A2AP model was fitted to 
the BPn structure by optimising the RMS difference 

10 between the co-ordinates of the backbone atoms in the active 
site-binding loops of the two inhibitors. The plasmin serine 
protease domain model was fitted to the trypsin structure by 
optimising the RMS difference between the co-ordinates of 
the C-alpha atoms of the conserved residues in an optimal 

X5 sequence alignment of the two [jroteins. The A2AP/plasmin 
complex model was then refined by energy-minimisation. 

The homology modelling was performed on a Silicon 
Graphics Indigo workstation using the Quanta molecular 
modelling program from Molecular Simulations Incorpo- 

20 rated. Sequence aligimients were produced using Quanta, 
the GCG sequence analysis software from the University of 
Wisconsin (Devereux, Haeberli and Smithies, Nucleic Acids 
Research 12(l):387-395 (1984), and proprietary sequence 
alignment software. However, the actual method by which 

25 the homology models were built is not critical to this 
invention. 

The trypsin and BFTI sequences used in the homology 
modelling were obtained from the Brookhaven Protein Data 
Bank atomic co-ordinate entry 2FrC, the thrombin sequence 
30 was obtained from the PPACK/thrombin co-ordinate file, the 
plasminogen sequence from the SWISSPROT database 
entry PLMN_HUMAN, and the A2AP sequence from the 
SWISSPROT entry A2AP_HUMAN. 

2. Mung Bean Nuclease Digestion 

35 10 units of mung bean nuclease was added to approxi- 
mately 1 ng DNA which had been digested with a restriction 
enzyme in a buffer containing 30 mM NaOAc pH5.0, 100 
mM NaCl, 2 mM ZnQ, 10% glycerol. The mung bean 
nuclease was incubated at 37** for 30 minutes, inactivated for 

40 15 minutes at 67** before being phenol extracted and ethanol 
precipitated. 

3. Oligonucleotide synthesis 

TTie oligonucleotides were synthesised by automated 
phosphoramidite chemistry using cyanoethyl phosphora- 
45 midites. The methodology is now widely used and has been 
described (Beaucage, S. L. and Caruthers, M. H. Tetrahe- 
dron Letters 24, 245 (1981) and Caruthers, M. H. Science 
230, 281-285 (1985)). 

4. Purification of Oligonucleotides 

50 The oligonucleotides were de-protected and removed 
from the CPG support by incubation in concentrated NH3. 
Typically, 50 mg of CPG carrying 1 micromole of oligo- 
nucleotide was de-protected by incubation for 5 hours at 70** 
in 600 ^l of concentrated NH3. The supernatant was trans- 

55 ferred to a fresh tube and the oligomer precipitated with 3 
volumes of ethanol. Following centrifugation the pellet was 
dried and resuspended in 1 ml of water. The concentration of 
crude oligomer was then determined by measuring the 
absorbance at 260 nm. For gel purification 10 absorbance 

60 units of the cmde oligonucleotide was dried down and 
resuspended in 15 yl of marker dye (90% de-ionised 
formamide, 10 mM tris, 10 mM borate, 1 mM EDTA, 0.1% 
bromophenol blue). The samples were heated at 90° for 1 
minute and then loaded onto a 1.2 mm thick denaturing 

65 polyacrylamide gel with 1.6 mm wide slots. The gel was 
prepared from a stock of 15% acrylamide, 0.6% bisacryla- 
mide and 7M urea in IX TBE and was polymerised with 
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0.1% ammonium persulphate and 0.025% TEMED. The gel 
was i^e-iun for 1 hr. The samples were run at 1500 V for 4-5 
hours. The bands were visualised by UV shadowing and 
those corresponding to the full length product cut out and 
transferred to micro-testubes. The oligomers were eluted s 
from the gel slice by soaking in AGEB (0.5M anunonium 
acetate, 0.0 IM magnesium acetate and 0.1% SDS) over- 
night. The AGEB buffer was then transfeired to fresh tubes 
and the oUgomer precipitated with three volumes of ethanol 
at 70** for 15 mins. The precipitate was collected by centri- lO 
fugion in an Eppendoif microfuge for 10 mins, the peUet 
washed in 80% ethanol, the purified oligomer dried, redis- 
solved in 1 ml of water and finally filtered through a 0.45 
micron micro-filter. (The word EFPENDORF is a trade 
mark.) The concentration of purified product was measured is 
by determining its absorbance at 260 nm. 

5. Kinasing of Oligomers 

100 pmole of oligomer was dried down and resuspended 
in 20 |d kinase buffer (70 mM Tris pH 7.6, 10 mM MgQ, 
1 mM ATR 0.2 mM spermidine, 0.5 mM dithiothreitol). 10 20 
u of T4 polynucleotide kinase was added and the mixture 
incubated at 37° for 30 mins. The kinase was then inacti- 
vated by heating at 70** for 10 mins. 

6. Dideoxy Sequencing 

The protocol used was essentially as has been described 25 
(Biggin, M. D., Gibson, T. J., Hong, G. F. RN.A.S. 80 
3963-3965 (1983). Where appropriate the method was 
modified to allow sequencing on plasmid DNA as has been 
described (Guo, L-H., Wu R Nucleic Acids Research 11 
5521-5540 (1983). 30 

7. Transformation 

Transformation was accomplished using standard proce- 
dures. The strain used as a recipient in the cloning using 
plasmid vectors was HW87 or DH5 which has the following 
genotype: 3S 



araD139(ara-lcu)dcn€97 (lac]POZY)den4 galU galK bsdR ipsL 
srI recAS6 

RZ1032 is a derivative of E, coU that lacks two enzymes 40 
of DNA metabolism; (a) dUTPase (dut) which results in a 
high concentration of intracellular dUTP, and (b) uracil 
N-glycosylase (ung) which is responsible f<x removing mis 
incorporated uracils from DNA (Kunkel et al. Methods in 
Enzymol., 154, 367-382 (1987)). its principal benefit is that 45 
these mutations lead to a higher frequency of mutants in site 
directed mutagenesis. RZ1032 has the following genotype: 



H&KL16PO/4S[lysA961-62>, dutl, ungl, thil, ie[A], 23x1. 
279: lib 10, su{£44 
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JM103 is a standard recipient strain for manipulations 
involving M13 t>ased vectors. 
8. Site Directed Mutagenesis 

Kinased mutagenesis primer (2.5 pmole) was annealed to 55 
the single stranded tenq)late DNA, which was prepared 
using RZ1032 as host, (1 pg) in a final reaction mix of 10 pi 
containing 70 mM Tris, 10 mM MgC12. The reaction mix- 
ture in a polypropylene micro-testube (EPPENDORF) was 
placed in a beaker containing 250 ml of water at 70** C. for 60 
3 minutes followed by 37** C. for 30 minutes. The annealed 
mixture was then placed on ice and the following reagents 
added: 1 of 10 X TM (700 mM TOs, 100 mM MgC12 
pH7.6), 1 pi of a mixture of all 4 deoxyribonucleotide 
triphosphates each at 5 mM, 2 pi of T4 DNA ligase (lOOu), 6S 
0.5 pi Klenow fragment of DNA polymerase and 4.5 pi of 
water. The polymerase reaction mixture was then incubated 



at 15** for 4-16 hrs. After the reaction was complete, 180 pi 
of TE (10 mM Tris, 1 mM EDTApHS.O) was added and the 
mutagenesis mixture stored at -20** C. For the isolation of 
mutant clones the mixture was then transformed into the 
recipient JM1(B as follows. A 5 ml overnight culture of 
JM103 in 2 X Vr (1.6% Bactotryptone, 1% Yeast Extract 
\% NaQ) was diluted 1 in a 100 into 50 ml of pre-warmed 
2 X YT. The culture was grown at 37° with aeration until the 
A600 reached 0.4. The cells were pelleted and resuspended 
in 0.5 vol of 50 mM CaC12 and kept on ice for 15 mins. The 
cells were then re-pelleted at 4^ and resuspended in 2.5 ml 
cold 50 mM CaC12. For the transfection, 0.25, 1, 2, 5, 20 and 
50 pi aliquots of the mutagenesis mixture were added to 200 
pi of competent cells which were kq)t on ice for 30 mins. 
The cells were then heated shocked at 42** for 2 mins. To 
each tube was then added 3.5 ml of YT soft agar containing 
0.2 ml of a late exponential culture of JM103, the contents 
were mixed briefly and then poured onto the surface of a 
pre-warmed plate containing 2 X YT solidified with 1.5% 
agar. The soft agar layer was allowed to set and the plates 
then incubated at 37** overnight 

Single stranded DNA was then prepared from isolated 
clone as follows: Single plaques were picked into 4 ml of 2 
X YT that had been seeded with 10 pi of a fresh overnight 
culture of JM103 in 2 X YT The culture was shaken 
vigorously for 6 hrs. 0.5 ml of the culture was then removed 
and added to 0.5 ml of 50% glycerol to give a reference 
stock that was stored at —20**. The remaining culture was 
centiifuged to remove the cells and 1 ml of supernatant 
canying the phage particles was transferred to a fresh 
EPPENDORF tube, 250 pi of 20% PEG6000, 250 mM NaQ 
was then added, mixed and the tubes incubated on ice for 15 
mins. The phage were then pelleted at 10,000 rpm for 10 
mins, the supernatant discarded and the tubes re-centrifiiged 
to collect the final traces of PEG solution which could then 
be removed and discarded. The phage peUet was thoroughly 
resuspended in 200 pi of TEN (10 mM Iris, 1 mM EDTA, 
0.3M NaOAc). The DNA was isolated by extraction with an 
equal voliune of Tris saturated phenol The phases were 
separated by a brief centrifugation and the aqueous phase 
transferred to a clean tube. The DNA was re-extracted with 
a mixture of 100 pi of phenol, 100 pi chloroform and the 
phases again separated by centrifugation. Traces of phenol 
were removed by three subsequent extractions with chloro- 
form and the DNA finally isolated by precipitation with 2.5 
volumes of ethanol at -20** overnight The DNA was pel- 
leted at 10,000 rpm for 10 min, washed in 70% ethanol, 
dried and finally resuspended in 50 pi of TE. 
9. Electroporation 

Chinese hamster ovary cells (CHO) or the mouse 
myeloma cell line p3x63-Ag8.653 were grown and har- 
vested in mid log growth phase. The cells were washed and 
resuspended in PBS and a viable ceU count was made. The 
cells were then pelleted and resuspended at 1x107 cells/ml. 
40 pg of linearised DNA was added to 1 ml of cells and 
allowed to stand on ice for 15 mins. One pulse of 8(X) V/ 25 
pF was administered to the cells using a commercially 
available electroporation apparatus (BIORAD GENE 
PULSER — trade mark). The cells were incubated on ice for 
a further 15 mins and then plated into 5 x96 well plates with 
200 pi of medium per well (DMEM, 5% PCS, Pen/Strep, 
glutamine) or 3x9 cm dishes with 10 mis medium in each 
dish and incubated overnight After 24 hrs the medium was 
removed and replaced with selective media containing xan- 
thine (250 pg/ml), mycophenolic acid (5 pg/ml) and 
Ixhypoxanthine-thymidine (HT). The cells were fed every 
third day. After about 14 days gpt resistant colonies are 
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evident in some of the wells and on the plates. The plates 
were screened for plasminogen by removing an aliquot of 
medium from each weU or plate and assayed using an EUSA 
assay. Qones producing plasminogen were scaled up and the 
expression level monitored to allow the selection of the best 
producer. 

10. ELISA for Human Plasminogen 

EUBA plates (Pro-Bind, Falcon) are coated with 50 
^l/well of goat anti-human plasminogen serum (Sigma) 
dUuted 1:1000 in coating buffer (4.0 g Na2CO3(10.H20), 
2.93 g NaHCOa per liter H20. pH 9.6) and incubated 
overnight at 4** C. Coating solution is then removed and 
plates are blocked by incubating with 50 ^1/well of PBS/ 
0.1% casein at room temperature for 15 minutes. Plates are 
then washed 3 times with PBS/0.05% TXveen 20. Samples of 
plasminogen dr standards diluted in PBS/I\veen are added to 
the plate and incubated at room temperature for 2 hours. The 
plates are then washed 3 times with PBS/IXveen and then 50 
fil/well of a 1:1000 dilution in PBS/TWeen of a monoclonal 
antihuman plasminogen antibody (eg #3641 and #3642 from 
American Diagnostica» New York, U.S.A.) is added and 
incubated at room temperature for 1 hour. The plates are 
again washed 3 times with PBS/Twcen and then 50 plAvell 
of horse radish peroxidase conjugated goat anti-mouse IgG 
(Sigma) is added and incubated at room temperature for 1 
hour. Alternatively, the bound plasminogen is revealed by 
incubation with 50 ^Vwell of horse radish peroxidase con- 
jugated sheep anti-human plasminogen (The Binding Site). 
The plates are washed 5 times with PBS/Tween and then 
incubated with 100 jil/weH of peroxidase substrate (O.IM 
sodium acetate/dtric acid buffer pH 6.0 containing 100 
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mg/liter 33'4^*-tetramethyl benzidine and 13 mM H202. 
The reaction is stopped after approximately 5 minutes by the 
addition of 25 ^1/well of 2.5M sulphuric acid and the 
absorbance at 450 nm read on a platereader. 

5 11. Purification of Plasminogen Variants 

Plasminogen variants are puriiied in a single step by 
chromatography on lysine SEPHAROSE 4B (Pharmacia). A 
column is equilibrated with at least 10 column volumes of 
0.05M sodiimi phosphate buffer pH 7.5. The column is 

10 loaded with conditioned medium at a ratio of 1 ml resin per 
0.6 mg of plasminogen variant as determined by ELISA 
using human glu-plasminogen as standard. Typically 400 ml 
of conditioned medium containing plasminogen are applied 
to a 10 ml column (H:E>=4) at a linear flow rate of 56 

15 ml/cm/h at 4° C. After loading is complete, the column is 
washed with a minimum of 5 column volumes of 0.05M 
phosphate buffer pH 7.5 containing 0.5M NaQ until non- 
specifically bound protein ceases to be eluted Desorption of 
bound plasminogen is achieved by the application of 0.2M 

20 epsilon-amino-caproic acid in de-ionised water pH 7.0. 
Elution requires 2 column volumes and is carried out at a 
linear flow rate of 17 ml/cm/h. Following analysis by SDS 
PAGE to check 10 purity, epsilon-amino-caproic acid is 
subsequently removed and replaced with a suitable buffer, 

25 eg TYis, PBS, HEPES or acetate, by chromatography on 
pre-packed, disposable, PDIO columns containing SEPHA- 
DEX G-25M (Pharmacia (The word SEPHADEX is a trade 
mark.) Typically, 2.5 ml of each plasminogen mutant at a 
concentration of 0.3 mg/ml are processed in accordance with 

30 the manufacturers* instructions. Fractions containing 
plasminogen, as determined by A280 are then pooled. 



SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION: 

( i i i ) NUMBER OF SEQUENCES: 10 



( 2 )INFORMAnONFORSEQroNO:l: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 690 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: double 
( D > TOPOLOGY: fincar 

( i i ) MOLECULE TYPE: cDNA 

( i i i ) HYPOTHETICAL: NO 

( i V ) ANTI-SENSE: NO 

( V i ) ORIGINAL SOXJRCB: 

( A ) ORGANISM: Homo safacns 

( i X ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCAnON: \..€00 
( D ) OTHER INFORMATION: ^jsitial 
/ codon_stBrt=l 

/ fujictk>o=**cncodcs plasmin protease domain" 
/ pioduct=^ucleotiile with co i rcsponding 
protean" 
/ nuaibeF=l 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GTT GTA GGO GGG TOT GTG OCC CAC CCA CAT TCC TOO CCC TOO CAA OTC 48 
Val Val Gly Gly Cyi Vol Ala His Pro His Scr Trp Pro Trp Gin Val 
1 5 10 15 
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AGT CTT AOA ACA AGO TTT GGA ATO CAC TTC TOT GO A GGC ACC TTG ATA 96 

Scr Leo Arg Tbr Axg Pbc Gly Met His Pbe Cyi Gly Gly Tbr Lea Ite 

2 0 2 5 3 0 

TCC CCA GAG TGO OTO TTG ACT OCT GCC CAC TGC TTG GAG AAO TCC CCA 144 

Ser Pro Gla Tip Val Leu Tbr Ala Ala His Cys Leu Glu Lys Ser Pro 
3 5 4 0 4 5 

AGG CCT TCA TCC TAC AAO OTC ATC CTO GOT GCA CAC CAA OAA GTG AAT 192 

Arg Pro Ser Ser Tyr Lys Val lie Leu Gly Ala His Olo Glu Val Asn 

5 0 5 5 6 0 

CTC GAA CCG CAT GOT CAO OAA ATA GAA GTO TCT AGO CTO TTC TTG GAG 240 

Lea Olu Pro His Gly Oln Glu Ilo Olu Val Ser Arg Leu Pbe Leu Glu 

65 70 75 ftO 

CCC ACA CGA AAA GAT ATT GCC TTG CTA AAO CTA AOC AGT CCT GCC GTC 2SS 

Pro Tbr Arg Lys Asp Ilo Ala Leo Leu Lys Leu Ser Ser Pro Ala Val 

8 5 9 0 9 5 

ATC ACT OAC AAA OTA ATC CCA OCT TOT CTG CCA TCC CCA AAT TAT OTG 336 

lie Tbr Asp Lys Val lie Pro Ala Cys Leu Pro Ser Pro Asa Tyr Val 

10 0 10 5 110 

GTC OCT GAC CGO ACC GAA TGT TTC ATC ACT OOC TOO GGA OAA ACC CAA 384 

Val Ala Asp Arg Tbr Glu Cys Pbe lie Tbr Gly Trp Gly Oln Tbr Ola 

lis 120 125 

GOT ACT TTT GGA OCT GGC CTT CTC AAG GAA OCC'CAO CTC CCT GTG ATT 432 

Gly Tbr Pbe Gly Ala Oly Leu Leu Lys Glu Ala Gin Leu Pro Val lie 

13 0 13 5 14 0 

GAG AAT AAA GTO TGC AAT COC TAT OAG TTT CTG AAT GGA AGA GTC CAA 480 

Olu Asn Lys Val Cys Asn Arg Tyr Glu Pbe Leo Asn Gly Arg Val Gin 

145 150 155 160 

TCC ACC GAA CTC TOT GCT GOO CAT TTG OCC OOA OOC ACT OAC AGT TGC 528 

Ser Tbr Olu Leo Cys Ala Oly His Leu Ala Oly Oly Tbr Asp Ser Cys 

16 5 17 0 17 5 

CAG GOT GAC AGT OGA GOT CCT CTO OTT TOC TTC GAG AAO OAC AAA TAC 576 

Gin Gly Asp Ser Gly Gly Pro Lou Val Cys Pbe Glu Lys Asp Lys Tyr 

18 0 18 5 19 0 

ATT TTA CAA OGA GTC ACT TCT TOO GOT CTT GGC TGT OCA CGC CCC AAT 624 

lie Leo Olo Oly Val Tbr Ser Trp Gly Leo Gly Cys Ala Arg Pro Asn 

195 200 20 5 

AAG CCT GOT OTC TAT GTT COT GTT TCA AGO TTT OTT ACT TOO ATT GAG 672 

Lys Pro Oly Val Tyr Val Arg Val Ser Arg Pbe Val Tbr Trp lie Glu 

210 215 220 

OGA GTG ATO AOA AAT AAT 690 

Gly Val Met Arg Asn Asn 

2 2 5 2 3 0 



( 2 ) INKMlMAnON F(^ SEQ ID NO:2: 

( i ) SEQUENCE CHARACTERISnCS: 

( A ) LEI40IH: 230 amino acids 
( B ) TYPE: flDUDO acid 
( D ) TOPOLOGY: Iii»ar 

( i i ) MOLECULE TYPE: protein 

( X i ) SEC2UEMCE DBSCRIFnON: SEQ ID N02: 

Val Val Oly Gly Cys Val Ala His Pro His Ser Trp Pro Trp Oln Val 
1 5 10 15 

Ser Leu Arg Tbr Arg Pbe Oly Met His Pbe Cys Oly Oly Tbr Leo lie 

2 0 2 5 3 0 

Ser Pro Olu Trp Val Leo Tbr Ala Ala Hii Cys Leu Olo Lys Ser Pro 
3 5 4 0 4 5 

Arg Pro Ser Ser Tyr Lys Val lie Leu Oly Ala His Oln Glu Val Aso 
5 0 5 5 6 0 
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Leu Olu Pro Hii Oly Gin Glu lie GIu Val Ser Arg Leu Phe Leu Glu 
65 70 75 80 

Pro Thr Arg Ly» Asp lie Ala Leu Leu Lys Leu Scr Ser Pro Ala Val 

8 5 9 0 9 5 

lie Thr Asp Lys Val lie Pro Ala Cys Leu Pro Scr Pro Asn Tyr Val 

10 0 10 5 110 

Val Ala Asp Arg Thr Glu Cys Pbe lie Tbr Oly Trp Gly Glu Thr Gin 
115 12 0 12 5 

Gly Thr Phc Gly Ala Gly Leu Leu Lys Glu Ala Gin Lcu Pro Val lie 
13 0 13 5 14 0 

Glu ASD Lys Val Cys Asn Arg Tyr Glu Phe Leu Asn Gly Arg Val Gin 
145 150 155 160 

Ser Thr Glu Leu CysAla Gly His Lou Ala Gly Oly Thr Asp Ser Cys 

16 5 17 0 17 5 

Gin Gly Asp Ser Oly Gly Pro Lcu Val Cys Phe Glu Lys Asp Lys Tyr 

18 0 18 5 19 0 

lie Lcu Gin Gly Val Thr Scr Trp Gly Leu Gly Cys Ala Arg Pro Asn 
195 200 205 

Lys Pro Oly Val Tyr Val Arg Val Scr Arg Phe Val Thr Trp lie Glu 
2 1 0 2 1 5 2 2 0 

Gly Val Met Arg Asn Asn 
2 2 5 2 3 0 

( 2 ) INFORMATION FOR SEQ ID NO:3: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 base p^ 
( B ) TYPE: nucleic add 
( C ) SIRANDEDNESS; single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( i i i ) HYPOTHETICAL: NO 

( i X ) FEATURE: 

( A ) NAME/KEY: miv;. feature 
( B ) LOCAnON: 1..24 

( D ) OTHER INFORMAnON: /ftmctioiJ^'MUTAGENESXS PRIMER 
FOR Al" 

/ froducts^-SYNTHEnC DNA" 
( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
GGTOCCTCCA CAATTGTGCA TTCC 24 



( 2 ) INFORMATION FOR SEQ ID NO:4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 27 base pairs 
C B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: singje 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( i X ) FEATURE: 

( A ) NAME/KEY: misc_Jcamre 
( 8 ) LOCAHON: 1.^7 

( D > OTHER INFORMATION: /fua:tioo='*MUTA.aENESIS PRIMER 
FOR A3" 

/ iioduct=**SYNTHEIIC DNA" 



( X i ) SEQUENCE DESCRIPTTON: SEQ ID NO:4: 
GTTCGAOATT CACTTTTTGG TGTOCAC 
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( 2 ) IKFORMXnON FOR SEQ ID NO-J: 

( i ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 24 base pairs 
( B ) TYPE: nucleic acid 
( C ) SIRANDEm^ESS: smsle 
( D ) TOPOUX> Y: linear 

( i i )MOLECUl£TYPE:cI»JA 

( i X )FEATURE: 

( A ) NAME/KEY: miscL-f catine 
( B ) LOCAnON: \J24 

( D ) OIHER. INPORMAnON: /iusctioiF="MU]AOENESIS PRIMER 
FOR A4" 

/ ircxJact=i*«YNIHHnC UNA" 
( X i ) SEQUENCE DESCRIFHON: SEQ ID NO:5: 
CTTOOOGACT TCTTCAAOCA OTOO 24 



( 2 ) INPORMAnON FOR SEQ ID HOi6: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENOIH: 24 hasc pans 
( B ) TYPE: awkic acid 
( C ) STRANDEDNESS: single 
( D ) TOPCXjOOY: linear 

( i i ) MCX£CULE TYPE: cl»^ 

( i X ) FEATURE: 

( A ) NAME/KEY: miK-fctture 
( B ) LOCXnON: 1.^ 

( D ) OIHER INPORMAnON: /fiinctioff^'MUIAGENESIS PRIMER 
USEDF<»A5" 
/ frodnct=s^YNIHEnC DNA" 

( X i ) SEQUENCE DESCRIPTKW: SEQ ID NO:6: 

CTCOTACOAA GCAGOACTTG CCAO 24 



( 2 ) INPORMAnON F<» SEQ ID NO:7: 

( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 27 bue pain 
( B ) TYPE: ouclcic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cI»iA 

( i X )FEArURE: 

( A ) NAME/KEY: miy frmirc 
( B ) LOCATIOS: 1.^7 

( D ) OIHER INF<^lMAnON: /lnnct>oop:"MUTAGENESIS PRIMER 
FOR A12" 

/ product="SYNrHEnC DNA" 
( X a ) SEQUENCE CSSCRIFTION: SEQ ID NO:7: 
CCAAACCTTO TTTCAAOACT OACTTOC 27 



( 2 )INPORMAnC»)FORSEQIDNa8: 

( i ) SEQUENCE CHARACIERISTICS: 
( A > LENGTH: 24 bue pun 
( B ) TYPE: lucldc add 
( C ) STRANISDNESS: single 
( D >TOPOLOGY: fincar 

( i i ) MCH£CULB TYPE: cDNA 

( t X }FEArURE: 

( A ) NAMELY: mim-fcanire 
( B ) LOCATION: 1-24 
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( D ) OTHER INFORMATION: /fuiictioff3"MUTAGENESIS PRIMER 
FOR A14" 

/ iroAjct="SYNlHEnC DNA" 



( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
CTTGGGGACT TGGCTAGACA GTGG 
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( 2 ) INFORMAnON FOR SEQ ID NO:9: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pars 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( i X ) FEATURE: 

( A ) NAME/KEY: nusc_f eatuie 
( B ) LOCAHON: 

( D ) OTHER INFORMAnON: /function="MtnAGENESIS PRIMER 
FOR A15" 

/ froduct=^'SYNrHEnC DNA" 
( X i > SEQUENCE DESCRIPTION: SEQ ID NO:9: 
CTTGGGGACT TCCTTAGACA GTOGG 
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( 2 ) INFORMATION FOR SEQ ID NO:10: 

( i ) SEQUENCE CHARACTERI^CS: 
( A ) LENGTH: 27 base pras 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: syntbede DNA 

( i X ) FEATURE: 

( A ) NAME/KEY: misc_Jeature 
( B ) LOCAnON: 1..27 

( C ) OTHER INFORMAnON: /fiMiction="MUTAGENESIS PRIMER 
FOR A16" 

/ product="SYNTHEnC DNA" 
( X i ) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GTTCGAGATT CACTGCTTGG TGTGCAC 27 



We claim: 

1. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of the residue in a region corre- 
sponding to residue 17 according to the numbering of SEQ 
ID NO 2. 

2. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of one or more residues in a region 
corresponding to residues 44 to 54 according to the num- 
bering of SEQ ID NO 2. 

3. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of the residue in a region corre- 
sponding to residue 45 according to the numbering of SEQ 
ID NO 2. 

4. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of the residue in a region corre- 
sponding to residue 62 according to the numbering of SEQ 
ID NO 2. 

5. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
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comprises the mutation of one or more residues in a region 
corresponding to residues 202 or 203 according to the 
numbering of SEQ ID NO 2. 

6. A plasmin modified so as to exhibit resistance to 
50 inhibitors of plasmin, characterized in that the modification 

comprises the mutation of one or more residues in a region 
or regions corresponding to residues 17, 44 to 54, 62, 202 
and 203, according to the numbering of SEQ ID NO 2. 

7. A plasmin as claimed in claim 6, which has one or more 
of the following mutations: Ser-17 to Leu, Glu-45 to Lys or 
Arg, or Glu-62 to Lys or Ala, according to the numbering of 
SEQ ID NO 2. 

8. A plasmin as claimed in claim 7, which has the 
following mutations: Glu-62 to Lys and Glu-45 to Lys, 
according to the nimaber of SEQ ID NO 2. 

^ 9. A plasmin precursor, which, when cleaved, fonns a 
plasmin modified so as to exhibit resistance to inhibitors of 
plasmin, characterized in that the modification comprises the 
mutation of one or more residues in a region or regions 
corresponding to residues 17, 44 to 54, 62, 202, and 203, 

65 according to the numbering of SEQ ID NO 2. 

10. A plasmin precursor, which, when cleaved, forms a 
modified plasmin as claimed in claim 7. 
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11. Aplasmin precursor, which, when cleaved, forms said 
modified plasmin of claim 8. 

12. An isolated nucleotide sequence coding for said 
plasmin precursor of claim 9. 

13. The isolated nucleotide sequence of claim 12. further 5 
comprising a first nucleic acid sequence coding for said 
modified plasmin, operatively linked to a second nucleic 
acid sequence containing a strong promoter and enhancer 
sequence derived from human cytomegalovirus, a third 
nucleic acid sequence encoding a polyadenylation sequence lo 
derived from SV40 and a fourth nucleic acid sequence 
coding for a selectable marker expressed from an SV40 
promoter and having an additional SV40 polyadenylation 
signal at the 3' end of the selectable marker sequence. 

14. An expression vector comprising the nucleic acid is 
sequence as in claims 12 or 13. 

15. The vector of claim 14. wherein said vector is selected 
from the group consisting of a plasmid, a cosmid, and a 
f^age. 

16. A cell transformed or transfected with the expression 20 
vector of claim 14. 
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17. Hie cell of claim 16, wherein said cell is additionally 
transfected or transformed by an expression vector cono^ris- 
ing a nucleic acid sequence coding for a plasmin inhibitor. 

18. The cell of daim 17, wherein said plasmin inhibitor is 
selected from the group consisting of alpha2-antiplasmin, 
alpha2-macFoglobulin and alpha 1-antdtxypsin. 

19. A pharmaceutical composition comprising a modified 
plasmin as claimed in any one of claims 1 to 8, together with 
a pharmaceuticaUy acceptable carrier. 

20. A pharmaceutical composition conq>rising a modified 
plasmin precursor as claimed in any one of claims 9 to 11, 
together with a pharmaceutically acceptable carrier. 

21. A veterinary coiiq>osition for use in mammals, com- 
prising a modified plasmin as claimed in any one of claims 
1 to 8, together witii a carrier acceptable for veterinary use. 

2Z A veterinary composition for use in mammals, com- 
prising a modified plasmin precursor as claimed in any one 
of claims 9 to 11, together with a carrier acceptable for 
veterinary use. 
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ABSTRACT 

The University of Wisconsin Genetics Computer Group (UWGCG) has been 
organized to develop computational tools for the analysis and publication of 
biological sequence data* A group of programs that will interact with each 
other has been developed for the Digital Equipment Corporation VAX computer 
using the VMS operating system. The programs available and the conditions for 
transfer are described, 

INTRODUCTIOW 

The rapid advances in the field of molecular genetics and DNA sequencing 
have made it imperative for many laboratories to use computers to analyze and 
manage sequence data, UWGCG was founded when it became clear to several 
faculty members at the University of Wisconsin that the there was no set of 
sequence analysis programs that could be used together as a coherent system 
and be modified easily in response to new ideas. 

With intramural support a computer group was organized to build a strong 
foundation of software upon which future programs in molecular genetics could 
be based. This initial project has been completed and the resulting programs, 
written in Fortran 77, are available for VAX computers using the VMS operating 
system. Most of the programs can be used with only a terminal, although 

several require a Hewlett Packard plotter. 

UWGCG software has been installed for testing at eight different 
institutions. A simple method has been developed for transferring and 
maintaining this system on other VAX computers. 

DESIGN PRINCIPLES 

UWGCG program design is based on the "software tools'* approach of 
Kernighan and Plauger(l). Each program performs a simple function and is easy 
to use. The programs can be used independently in different combinations so 
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that complex problems are solved by the use of several programs in succession. 
New programming is simplified since less effort is required to bridge a gap 
between existing programs. 

UUGCG software is designed to be maintained and modified at sites other 
than the University of Wisconsin. The program manual is extensive and the 
source codes are organised to make modification convenient. Scientists using 
UWGCG software are encouraged to use existing programs as a framework for 
developing new ones. Our copyright can be removed from any program modified 
by more than 25Z of our original effort. 

PROGRAMS AVAILABLE FROM UWGCG 

The programs described below are named and defined individually in Table 1. 

Program names in the text are underlined. 

Comparisons 

Comparisons may be done with *'dot plots** using the method of Maizel and 
Lenk(2). Optimal alignments can be generated by the methods of Needleman and 
Wunsch(3)y of Sellers(4), and the "local homology" method of Smith and 
Waterman(5). The Smith and Waterman alignment algorithm is also the most 
sensitive method available for identifying similarities between weakly related 
sequences. 

Mapping and Searching 

Mapping is available in several formats. Graphic maps display all of the 
cuts for each restriction enzyme on parallel lines. This graphic map 
facilitates selection of enzymes for isolating any region of a sequenced DNA 
molecule. Sorted maps in tabular format arrange the fragments from any 
digestion in order of molecular weight to show which fragments are similar in 
size and thus likely to be confused in gels. Another frequently used mapping 
format, designed by Frederick Blattner(6), displays the enzyme cuts above the 
original DNA sequence. Both strands of the DNA and all six frames of 
translation are shown. 

All mapping programs will search for user-specif ied sequences, allowing 
features to be marked at the appropriate position on a restriction map. The 
mapping and searching programs can be used to aid site^specif ic mutagenesis 
experiments by showing where mutations could generate new restriction sites. 
All of the positions in a sequence where a synthetic probe could pair with one 
or more mismatches can also be located. Sequences related to less precisely 
defined features such as promoters or intervening sequence splice sites, can 
be located with a program that uses a consensus sequence as a probe. The 
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Table 1 




Programs Available from UWGCG 


Name 


Function 


DotPlot* 


makes a dot plot by method of Maizel and Lenk(2) 


Gap 


finds optimal alignment by method of Needleman and WunschO) 


Bes tFiC 


finds ODtimal alisnment bv method of Smith and Waterman(5) 


MapPlot+ 


shows restriction map for each enzyme graphically 


MapSort 


tabulates maps sorted by fragment position and size 


Map 


disolavs restriction sites and orotein translations above 




and below the orisinal seauenceCBlattner ■ 6) 


f!nn fi An A 1 1 R 




Fi tConsensus 


finds seauences similar to a consensus seauence usins a 




consensus table as a orobe 


Find 


finds sites specified interactivelv 


Stemloop 


finds all possible stems (inverted repeats) and loops 


Fold* 


finds an RNA secondary structure of minimum free energy 




bv the method of Zuker(7) 


CodonPre f erence* 


plots the similarity between the codon choices in each 




reading frame and a codon frequency table(8) 


CodonFrequency 


tabulates codon freauencies 


Corras pond 


finds similar oatterns of codon choice bv comoarins 

A ft ft 49 19 ^b BftA ^b 4* ^» ^ ft.^ 49 49 Jb %9 49 v ft B p 49 49 _W ^rfta# ftv ft ft Jtfc 




codon frequency tables (Grantham et al9 9) 


Tes tCode* 


finds possible coding regions by plotting 




the '*TestCode" statistic of Fickett(lO) 


Frame'*' 


plots rare codons and open reading frame8(8) 

ft ft ^9 


PlotStatistics* 


plots asymmetries of composition for one strand 


Composition 


measures comoosition. di and trinucleotide freauencies 


Repeat 


finds repeats (direct, not inverted) 


Fingerprint 


shows the labelled fragments expected for an RNA fingerprint 


Seqed 


screen oriented seauence editor for entering* editing 




and checking sequences 


Assemble 


joins sequences together 


Shuffle 


randomizes a sequence maintaining composition 


Reverse 


reverses and/or complements a sequence 


Reformat 


converts a sequence file from one format to another 


Translate 


translates a nucleotide into a peptide sequence 


BackTranslate 


translates a peptide into a nucleotide sequence 


Spew 


sends a sequence to another computer 


GetSeq 


accepts a sequence from another computer 


Crypt 


encrypts a file for access only by password 


Simplify 


substitutes one of six chemically similar amino acid 




families for each residue in a peptide sequence 


Publish 


arranges sequences for publication 


Poster* 


plots text (for labelling figures and posters) 


OverPrint 


prints darkened text for figures with a daisy wheel printer 



^ requires a Hewlett Packard Series 7221 terminal plotter 
* Fold is distributed by Dr. Michael Zuker not UWGCG. 
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mapping programs can also be used on protein sequences Co identify the 
peptides resulting from proteolytic cleavage. 
Secondary Structure 

Three programs are available to examine secondary structure in nucleic 
acids. The program StemLoop Identifies all inverted repeats. An 
implementation of Dr. Michael Zuker's Fold program(7) finds an RNA secondary 
structure of minimum free energy based on published values of stacking and 
loop destabilizing energies. The "dot plot" comparison (mentioned above) of a 
sequence compared to its opposite strand gives a graphic picture of the 
pattern of inverted repeats in a sequence. 

Analysis of Composition and the Location of Genetic Domains 

Regions of a sequence with non-random base distribution can be displayed 
with three graphic tools designed to identify genetic domains. The program 
CodonPreference (8) identifies potential coding regions by searching through 
each reading frame for a pattern of preferred codon choices. The 
CodonPreference plot predicts the level of translational expression of mRNAs 
and helps identify frame shifts In DNA sequence data. Patterns of codon 
choice can be compared with the program Correspond (9) . When a strong pattern 
of codon preferences is not expected, the "TestCode" statistic of Fickett(lO) 
can be plotted to show regions of compositional constraint at every third 
base. Another program plots asymmetries of composition by strand. Strand 
asymmetries have been associated with genetic domains by several 
authorsC 11)( 12) . A fourth program called Frame marks the positions of rare 
codons and open reading frames on a graph showing all six reading frames. 

Several tools are available to measure content and to count dinucleotide , 
trinucleotide, neighbor and repeat frequencies. A program that predicts RNA 
fingerprint patterns and another that tabulates codon frequencies complete the 
group of programs that analyze composition. 
Sequence Manipulation 

Sequences may be entered, assembled, edited, reversed, randomized, 
reformatted, translated, back-translated, documented, transferred, or 
encrypted rapidly with a large set of sequence manipulation tools. 

A screen-oriented editor is available that allows sequences to be entered 
and checked. After a sequence is entered, it may be reentered for 
proofreading. Whenever a reentered base is at variance with the original, the 
terminal bell rings and the position is marked. Existing sequences can be 
edited quickly by moving directly to a sequence position specified by either a 
coordinate or a sequence pattern. The program can reassign the terminal's 
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keys to place G, A, T and C conveniently under the fingers of one hand in the 
same order as the Icuies of a sequencing gel. 

Programs are available for changing sequence file format. Sequence data 
from any source can be used in UWGCG programs, and sequence files maintained 
with UWGCG software can be converted for use in other non-UWGCG programs. For 
instance, the programs of Roger Staden(13) or Intelligenetics Inc. (14) could 
be used to assemble a sequence from the sequences of many small sub- fragments 
generated by DNAase I digestion. The assembled sequence could then be 
reformatted for use in any UWGCG program. A program is available that 
transfers sequences to and from other computers. 
Sequence Publication 

A program, Publish , will format sequences into figures. Publish has 
alternatives for line size, numbering, scaling, translation and comparison to 
other sequences. Poster is a program that will plot text on figures. 

GENERAL FEATURES OF UWGCG SOFTWARE 
Interactive Style 

Each program is run by simply typing its name. Every parameter required 
by the program is obtained interactively. Questions are answered with a file 
name, a yes, a no, a number, or a letter from a menu. Default answers are 
displayed. Programs are insensitive to absurd answers and will ask the 
question again if, for instance, you name a file that does not exist or if you 
use a nonnumeric character when typing a number. Special features such as 
plotting features oriented to publication, are obtained by using an extra word 
next to the program's name when the program is run. Thus parameter queries 
are kept to a minimum for the normal use of each program. 
Data 

Both the NIH-GenBank(15) and the EMBL(16) nucleotide sequence data 
libraries are available "on-line" to any UWGCG program. A Search utility will 
locate sequences in the libraries by key word. A Find utility will locate 
library entries containing any specified sequence, A program is available 
that installs the new data sent periodically from GenBank and EMBL to update 
their data libraries. 

All of the data in the system are stored in text files that can be read 
and modified easily. Every data file has an English heading describing the 
contents. The data files may be copied by each user for analysis or 
modification. Programs recognize and read user-modified input data 
automatically. Data files can be modified with any text editor. 
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Sequence File Structure 

Sequences are maintained in files that allow documentation and numbering 
both above and within the sequence. This file format is compatible with both 
of the nucleic acid sequence libraries and has been adopted as the standard 
sequence file format by the data base project at the European Molecular 
Biology Lab. Because genetic manipulations commonly involve linking several 
molecules of known sequence, UWGCG sequence files are designed to support 
concatenation by allowing comments to appear within the sequences at any 
location. Coding sequences or the boundaries between cloning vector and 
insert, for instance, can be marked within the sequence itself for immediate 
identification. 
Sequence Symbols 

All possible nucleotide ambiguities and all standard one*letter amino 
acid codes are part of the UWGCG symbol set that includes all alphabetic 
characters plus five additional characters. The proposed lUB-XUPAC standard 
nucleotide ambiguity symbolsd?) are used for the mapping, searching and 
comparison programs. Lower case characters are used in sequences to indicate 
uncertainty as distinct from ambiguity. This allows the entire lexicon of 
symbols to be reused with same meaning, but with the prefix "maybe-." This 
reuse of the symbol set in lower case makes the uncertainty symbols more 
complete, understandable and visible. 
Symbol Comparison 

Sequence analysis programs generally make comparisons between sequence 
symbols (bases or amino acids) in order to find enzyme sites, create 
alignments, locate inverted repeats etc. These symbol comparisons are handled 
in several ways. 

Symbol comparisons for alignment, comparison and secondary structure 
analysis are made by looking up a value in a symbol comparison table for the 
quality of the match. The table might contain 1*8 for matches and O's for 
mismatches. If amino acids are being compared, however, a real ntmiber could 
be assigned at each position based on some previously assigned chemical 

similarity of the pair of residues or on the mutational distance between their 
codons. Standard symbol tables are provided by UWGCG, but the system is 

designed to allow each user to specify his own values. 

Symbols comparisons for mapping and searching operations in nucleic acids 
are made by converting the lUB-IUPAC symbols into a binary code. The bits of 
this code represent G, A, T and C with ambiguity symbols causing more than one 
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bit to be set. A group of library functions identify overlap between the bits 

for each lUB-IUPAC symbol. 
Documentation 

Documentation is available both in printed form and on the terminal 
screen. A 350 page manual describes the operation of each program in detail, 
gives practical considerations and shows what will appear on the screen during 
a session with the program. Output files and plots are shown for the session. 
The data for the session shown in the documentation are included with the 
system so that the each program's operation can be checked. The "on-line" 
documentation is the same as the manual, but can be changed immediately when a 
program is modified. 

All programs write output to files that are completely doc\xmented and 
sensibly organized for input to other programs. The input data, the program 
and the parameters used are clearly identified in every output file. 
Procedure Library 

UWGC6 programs are written largely as calls to a library of 250 
procedures designed to manipulate biological sequences. These procedures use 
data and file structures which have been designed to simplify program 
modification. For instance, standard operations such as reading sequences 
from files are always handled by a single library procedure. Thus a change in 
sequence file format requires only one subroutine to be modified for the new 
format to be acceptable to all of the programs in the system. Command 
procedures are available to help modify the library. The procedure library 
can be used by programs written in any language. 

DISTRIBUTION OF UWGCG SOFTWARE 
Intent 

The intent of UWGCG is to make its software available at the lowest 
possible cost to as many scientists as possible. 
Fees 

A fee of $2,000 for non-profit institutions or $4,000 for industries is 
being charged for a tape and documentation for each computer on which UWGCG 
software is installed. While no continuing fee is required, UWGCG software, 
like the field it supports, is changing very rapidly. A consortium of 
industries and academic laboratories is planned to support the project in the 
future. The consortium will entitle its members to periodic updates and to 
influence the direction of new programming undertaken by UWGCG in return for a 
pledge of continuing financial support. 
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Copyrights 

UUGCG retains the copyrights to all of its software and UWGCG must be 
contacted before all or any part of the its software package is copied or 
transferred to any machine* UWGCG is, however, mandated to provide research 
tools to help scientists working in the area of molecular genetics and we are 
glad to see our source codes become the basis of further programming efforts 
by other scientists. Copyright can be removed for any program modified by 
more than 23Z of its original effort. 
Tape Format 

The UWGCG package is usually distributed in VAX/VMS "backup" format on a 

9 track magnetic tape recorded at 1600 bits/inch. The system consists of 
about 1000 files using about 20,000 blocks at 312 bytes/block. The current 
versions of the GenBank and EMBL nucleotide sequence data bases are normally 
included which add another 3,000 files and require another 20,000 blocks. 

Upon request UWGCG will make a card image tape of all of the Fortran 77 
programs and procedures for reading on computers other than the VAX. The card 
image tape is usually provided at 1600 bits/ inch with 80 characters/record and 

10 records/block. Adaptation of UWGCG software to systems other than VAX/VMS 
may take considerable effort. 

Equipment Required 

UWGCG programs and command procedures will run on a Digital Equipment 
Corporation (DEC) VAX computer that is using version 3.0 or greater of the DEC 
VMS operating system. A tape drive is necessary; a floating point accelerator 
and a DEC Fortran compiler are helpful, but not required. All programs can be 
run from a DEC VT52 or VTIOO terminal. Seven programs, as noted in table 1, 
require a Hewlett Packard 7221 terminal plotter wired in series with the 
terminal. Several utilities support a daisy wheel compatible printer attached 
to the terminal's pass-through port, however, all programs write output files 
suitable for printing on any standard device. 
Inquiries 

Inquiries may be sent to John Devereux at the Laboratory of Genetics, 
University of Wisconsin, Madison, WI, USA 53706, (608) 263-8970. UWGCG is not 
licensed to distribute Fold (7) , but the UWGCG implementation is available from 
Michael Zuker, Division of Biological Sciences, National Research Council of 
Canada, 100 Sussex Drive, Ottawa, Canada, KIA 0R6 (613) 992-4182, 
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A cDNA coding for the rat serine proteinase hepsin was isolated and its nucleotide sequence has been determined. The cDN A 
was ml nucleot des long and coniained an open reading frame encoding a protein consisting of 416 ammo-acd re«^^"« . ^^^^^ 
deduced ar^ino-acid sequence of the rat enzyme was very similar to the human hepsin sharing an ammo-acd ^^^^^ H . 
S 7% Hydropathy plots reveal the presence of a short hydrophobic region close to the N-terminus ^^^-^^Jl^^^^^^^ 
franrmembrane domain which anchors the proteinase on the cell surface. The predicted sequence contams the H.s. Asp and Ser 
residues which make up the catalytic triad common to all serine proteinases. 



Hepsin is a membrane-bound serine proteinase 
which was originally identified from cDNA clones iso- 
lated from human liver libraries [1]. The role of this 
proteinase is not known and the protein is poorly 
characterized with respect to its physical characteristics 
and substrate specificity. Human hepsin deduced from 
the encoding cDNA consists of 417 amino-acid residues 
and contains a short hydrophobic region near the 
amino-terminus believed to be a membrane spanning 
region. Immunostaining studies of cultured HepG2 cells 
demonstrate that hepsin is localized on the outer cell 
membrane surface with its NH2-terminal side facing 
the cytosol and the carboxyl or catalytic side at the cell 
surface [2,3]. In this paper we report the cloning and 
sequence of the rat liver hepsin gene and compare 
structural similarities with human hepsin and other 
serine proteinases. 

A rat liver cDNA library (Stratagene, No. 936507) 
was screened with a labeled DNA probe corresponding 
lo 137 nucleotides at the 3'-end of the rat hepsin 
cDNA. This cDNA probe had previously been isolated 
attached to a rat 5-alpha-reductase cDNA [4]. Six 
positive clones were isolated after screening about 
4.5 • 10^ phage plaques. Restriction analysis of the 
DNA from the positive plaques revealed that the largest 
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insert was almost 1800 nucleotides in length. This 
EcoK\ fragment was then subcloned into, the plasmid 
pBSK-(Stratagene). The DNA insert was self-ligated, 
fractionated by sonication, subcloned into M13mpl8 
and both strands were sequenced using the dideoxy 
chain termination method [51. 

The ^nucleotide sequence and the deduced amino- 
acid sequence for rat hepsin are shown in Fig. 1. The 
cDNA presented here is 1739 nucleotides in length and 
contains 184 nucleotides of untranslated sequence at 
the 5'-end, an open reading frame consisting of 1248 
nucleotides encoding a protein of 416 amino-acid 
residues, a TGA stop codon, 304 nucleotides at the 
3'-end and 33 adenine residues believed to make up 
the poly(A) tail. Based on the cDNA sequence, rat 
hepsin would have a predicted molecular mass of 44 930 
Da and contains one potential ^/-linked carbohydrate 
attachment site at Asn-111. 

Alignment of the deduced amino-acid sequence of 
rat and human hepsin is shown in Fig. 2. The aligned 
amino-acids reveal a large degree of homology with 
about 89% of the amino-acid residues being identical. 
Rat hepsin is one amino-acid residue shorter at the 
amino-terminus than the human enzyme. Like human 
hepsin, rat hepsin contains a 27-amino-acid hydropho- 
bic region which is characteristic of a transmembrane 
domain 161. This region is believed to anchor the pro- 
teinase on the outer cell membrane in a specific orien- 
tation with the catalytic domain exposed to the extra- 
cellular environment. Hepsin does not possess an obvi- 
ous signal sequence but does appear to be synthesized 



» « 

* 

i • 



1 

i 

r 



as an inactive precursor with an Arg-161-Ile-162 cleav- 
age site involved in zynnogen activation. Qeavage of 
this peptide bond results in a noncatalytic polypeptide 
consisting of 161 amino-acid residues and a carboxy- 



351 



terminal catalytic chain consisting of 255 residues that 
contains several highly conserved regions common to 
serine proteinases. By comparing the hepsin sequence 
presented here to other well-characterized serine pro- 
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14 3 9 CCCCTCATCCCCTCCTCCGCCCTCCTCCAOCATCCACAGTCACACTTCCTCTCGTCCCTCCACCCCCACCTCCCAGCCTC^ 
1526 CCCCTCAC ATCCAACCGTTTTCTGCTCGC ATCCACTCCATAGATCCAACGATCCTCGGTCCAACGACCTCTCT^ 
1613 CCC ACTC AATCCC ACCGCC ATTCGCCTC ACCCTCCC ACCCC ATC T AAAT ATTACTCTC TCCTCTCCGGCCTCCTTTCC ACCCCCCCC 
1700 TTCTCCCC ATCCTCTTTAAATAATAAACCTCCTTTTCATT 

Fig. I. cDNA sequence and predicted amino-acid sequence of rat hepsin. Nucleotides are numbered at left and amino-acid residues 
predicted transmembrane domain is underlined and ( v) represents the proposed zymogen activation cleavage site. The catalytic 

starred and a potential AT-linked glycosylation site is indicated by (•}. 
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Fig. 2, Comparison or the deduced amtno-acid sequences of rat and 
human hepstn. Residues in the human sequence that are identical 10 
those of the rat are represented by a single dot and differences arc 

indicated. 



teinases, one can predict that the two conserved cys- 
teine residues at positions 152 and 276 are involved in 
a disulfide linkage between the noncatalytic and cat- 
alytic chains of hepsin. Many interesting similarities of 
hepsin to other serine proteinases have already been 
considered by Leytus et al. [1] in their description of 
human hepsin. 

Proteinases are involved in many biological pro- 
cesses such as blood coagulation, fibrinolysis and com- 
plement activation [7]. However, the biological role of 
hepsin remains unclear since its enzymatic specificity 



and physiological substrates are presently unknown. 
Analysis of the amino-acid sequence of hepsin reveals 
several key residues which are similar to trypsin espe- 
cially in the highly conserved sequences which sur- 
round the catalytic site. Although substrate specificity 
is unknown, the presence of an Asp at position 346 
would suggest that hepsin exhibits trypsin like activity 
since a similar residue is found in trypsin at the bottom 
of the substrate binding pocket [8], The precise role of 
this enzyme will remain a subject of speculation until 
the native enzyme can be purified and further charac- 
terized. 
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Localization of the mosaic transmembrane serine protease corin to 
heart myocytes 

John D. Hooper\ Anthony L Scarman\ Belinda E. Clarke^ John F. Normyle^ and Toni M. Antalis^ 
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Corin cDNA encodes an unusual mosaic type II iransmembrane serine protease, which possesses, in addition to a 
trypsin-like serine protease domain, two frizzled domains, eight low-density lipoprotein (LDL) receptor domains, 
a scavenger receptor domain, as well as an intracellular cytoplasmic domain. In in vitro experiments, recombinant 
human corin has recently been shown to activate pro-atrial natriuretic peptide (ANP), a cardiac hormone essential 
for the regulation of blood pressure. Here we report the first characterization of corin protein expression in heart 
tissue. We generated antibodies to two different peptides derived from unique regions of the corin polypeptide, 
which detected immunoreactive corin protein of approximately 125-135 kDa in lysales from human heart 
tissues. Immunostaining of sections of human heart showed corin expression was specifically localized to the 
cross striations of cardiac myocytes, with a pattern of expression consistent with an integral membrane 
localization. Corin was not detected in sections of skeletal or smooth muscle. Corin has been suggested to be a 
candidate gene for the rare congenital heart disease, total anomalous pulmonary venous return (TAPVR) as the 
corin gene colocalizes to the TAPVR locus on human chromosome 4. However examination of corin protein 
expression in TAPVR heart tissue did not show evidence of abnormal corin expression. The demonstrated corin 
protein expression by heart myocytes supports its proposed role as the pro- ANP convertase, and thus a potentially 
critical mediator of major cardiovascular diseases including hypertension and congestive heart failure. 

Keywords: serine protease; corin; heart; pro-atrial natriuretic peptide (pro- ANP); TAPVR. 



Serine proteases are found in all living organisms, ranging from 
viruses to humans [1], where they serve important and varied 
biological functions in situations requiring limited proteolysis. 
Their activities impact on areas as diverse as hemostasis, tissue 
remodelling and wound repair, inflammation, angiogenesis, 
fibrinogenesis and fibrinolysis. Cell surface serine proteases 
have been associated largely with extracellular matrix degra- 
dation, but there are emerging roles for these proteases in 
generating bioactive matrix protein fragments, influencing the 
release, the activation and bioavailability of growth factors and 
in shedding of cell surface proteins [2—6]. 

Many serine proteases are mosaic proteins comprising 
multiple, structurally distinct domains necessary for regulating 
enzymatic activity. Circulating serine proteases of the blood 
coagulation (e.g. prothrombin and factor X) [7], fibrinolysis 
(e.g. plasminogen activators) [8] and complement (e.g. Clr and 
Cls) [9] systems are well characterized examples of mosaic 
proteins. While the vast majority of known serine proteases are 
secreted, more recently some serine proteases have been found 
to possess integral transmembrane domains. The proteins 
enteropeptidase [10], hepsin [11] and most recently, TMPRSS2 

Correspondence toT. M. Antalis, Queensland Institute of Medical 
Research, Post Office Royal Brisbane Hospital, Brisbane, 4029, 
Queensland, Australia. Fax: + 61 73362 0107, Tel.: + 61 73362 0312, 
E-mail: toniA@qimr.edu.au 

Abbreviations: LDL, low-density lipoprotein; ANP, atrial natriuretic 
peptide; TAPVR, total anomalous pulmonary venous return; tPA, 
tissue-type plasminogen activator; uPA, urokinase-type plasminogen 
activator; ang, angiotensin; ACE, angiotensin converting enzyme. 
(Received 24 July 2000, revised 12 September 2000, accepted 
4 October 2000) 



[12] are examples of mosaic serine proteases with type II 
transmembrane domains. These enzymes are positioned on the 
plasma membrane via a membrane spanning domain close to 
the N-terminus. In addition to membrane spanning and protease 
domains, enteropeptidase also contains two low-density lipo- 
protein (LDL) receptor domains, a meprin-like domain, two 
Clr-like domains and a truncated scavenger receptor domain. 
An LDL receptor domain and a scavenger receptor domain 
have also been identified in TMPRSS2 [12]. The functions of 
these domains have not been determined. 

Serine proteases play important roles in several aspects of 
heart physiology and cardiovascular disease [13]. The mast cell 
serine protease chymase is believed to be the major converter of 
angiotensin (ang)I to angll in human heart tissue [14]. The 
involvement of angll in normal cardiac function as well as in 
heart ailments such as hypertrophy, heart failure and ischaemic 
heart disease is indicated by the finding that inhibition of the 
angiotensin converting enzyme (ACE), leads to beneficial 
outcomes for sufferers of these diseases [15]. However, ACE 
inhibitors block only 10-20% of angi conversion in heart tissue 
whereas the remaining activity is blocked by serine protease 
inhibitors [16]. The fibrinolytic serine proteases tissue-type 
plasminogen activator (tPA) and urokinase-type plasminogen 
activator (uPA) are also thought to be involved in the 
progression of heart disease. uPA is present at significantly 
elevated levels in the atherosclerotic lesions responsible for 
myocardial infarction and failure [17]. The reduction in tPA 
from arteriolar smooth muscle cells is linked to the develop- 
ment of coronary artery disease in transplanted hearts [18]. 

Our own work and that of Yan et al. [19] has led to the recent 
cloning of a cDNA encoding a novel, multidomain type II 
transmembrane serine protease from human heart. The 
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predicted protein, corin, comprises two frizzled domains, eight 
LDL receptor domains, a truncated scavenger receptor domain, 
in addition to the extracellular trypsin-like serine protease 
domain [19]. Recent expression of recombinant corin demon- 
strates that it possesses pro-atrial naturitic peptide (ANP) 
convertase activity [20], and thus may play a critical role in the 
regulation of hypertension. In situ hybridization studies of 
mouse embryonic heart showed that corin mRNA was 
expressed as early as day 9.5 and maintained its expression 
through the adult animal [19]. The corin gene was mapped to 
human chromosome 4pl2— 13 [19], near the locus for the 
congenital heart disease, total anomalous pulmonary venous 
return (TAPVR). Here we present data describing for the first 
lime native corin protein expression and localization in human 
heart. 

MATERIALS AND METHODS 

Identification of corin cDNA by homology cloning 

Homology cloning was performed by RT-PCR using degenerate 
oligonucleotides corresponding to conserved regions of serine 
proteases [21-24]. Total RNA was isolated from SI a cells [25] 
following treatment with TNFa and cycloheximide for 4 h. 
RNA (5 \kg) was reverse transcribed at 42 °C using AMV 
reverse transcriptase (Promega, Madison,WI) in the presence of 
oligo dTi2-i8 (0.25 ^-g fiL~*) (Pharmacia Biotech, Sweden), 
50 mM Tris/HCI, pH 8.3, 50 mM KCl, 10 mM MgClj, 10 mM 
dithiothreitol and 0.5 mM spermidine in a total volume of 
20 |xL. PCR was performed using 1 jxL of the reverse 
transcriptase reaction mixture, 500 ng of each primer, 10 mM 
Tris HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCIa, 0.2 mM dNTPs 
and 1— 2units of Taq polymerase (Perkin Elmer). The primers 
were as follows. Forward, 5'-ACAGAATTCTGGGTIGTIACI- 
GCIGCICAYTG-3'; reverse, 5'-ACAGAATTCAXIGGICCI- 
CCI(C/G)(T/A)XTCICC-3'; where X = A or G, Y = C or T; 
I = inosine). 

Cycling conditions: 2 cycles of 94 for 2.5 min, 35 °C for 
2.5 min and 72 °C for 3 min, followed by 33 cycles of 94 °C 
for 2.5 min, 57 **C for 2.5 min and 72 '^C for 3 min, with a final 
extension at 72 °C for 7 min. PCR products of approximately 
450 bp were ligated into pGEM-T (Promega. Madison, WI, 
USA), cloned and analysed by DNA sequencing. A DNA 
fragment was identified which represented the partial corin 
sequence (nucleotides 334-748). The cDNA was extended 333 
nucleotides towards the 5' end by screening a cDNA library 
using two rounds of PCR and the nested oligonucleotides 
ATC2P3 and ATC2P1 in combination with the vector specific 
primer T7. The V end was extended to nucleotide 976 by two 
rounds of PCR and the nested oligonucleotides ATC2P4 and 
ATC2P5 in combination with the vector specific primer T3. The 
primer sequences are given below. 

ATC2P1: 5'-GCGTGTCTGCATGAACACTG-3'; ATC2P2: 
5'-ATGCCAAGCACCACTTTCCA-3'; ATC2P3: 5'-ATAGTC- 
CACCACTGCTCGAC-3'; ATC2P4: S'-TTAAGCTGCAAGA- 
GGGAGAG-3'. 

The DNA sequence of this cDNA has been deposited in 
the DDBJ/Genbank/EMBL database under accession no. 
AFl 13248. 

Heart tissue specimens 

Tissues from explanled hearts with terminal heart failure were 
either snap frozen in liquid nitrogen (for RNA and protein 
analyses) or processed for routine histological examination. Six 



paraffin embedded blocks of human heart tissue were obtained 
from autopsy cases with acute myocardial infarction. These 
blocks included both viable and nonviable myocardium. 
Procedures were in accordance with guidelines established by 
the National Health and Medical Research Council of Australia, 
Ethics Approval number EC9876(n). 

Northern and Poly(A)'^ RNA dot blot analyses 

Human multiple tissue northern blots (Clontech, Palo Alto, CA, 
USA) contained 2 fxg of poly(A)^ RNA per lane. The blots 
were hybridized with a "'^P-dCTP labeled EcoKl digested DNA 
fragment encoding corin cDNA in ExpressHyb (Clontech) 
solution at 65 **C and washed to a final stringency of 
0.2 X NaCI/Cit, 0.1% SDS at 65 °C. The blot was reprobed 
with p-actin as a measure of loading in each lane. For the 
mouse tissue blot, total RNA was purified from mouse tissues, 
separated by denaturing gel electrophoresis and transferred to 
Hybond-N nylon membranes as described [26]. The blot was 
hybridized with the radiolabelled human corin DNA probe 
under lower stringency conditions in ExpressHyb solution at 
55 °Q and washed to a final stringency of 1 x NaCl/Cit, 0.1% 
SDS at 55 °C. The mouse tissue blot was stained with ethidium 
bromide to confirm RNA loading in each lane. 

Production of affinity purified antipeptide polyclonal 
antibodies 

Rabbit polyclonal antibodies were generated against corin 
specific peptides derived from nonhomologous hydrophilic 
regions within the corin amino-acid sequence. Two peptides, 
each containing a cysteine residue incorporated at the C-terminus, 
were synthesized (Auspep, Parkville, Australia) and conjugated 
to keyhole limpet hemocyanin using |x-maleimidobenzoic acid 
A^-hydroxysuccinimide ester. The peptides were: Al: IQEQE- 
KEPRWLTLHSNWE-C, A2: GHMGNKMPFKLQEGE-C. 
Rabbit antisera was peptide-affinity purified using SulfoLink 
coupling gel (Pierce, Rockville, IL). The specificity of each 
antibody was tested against the immunogenic peptide by 
ELISA. 

Western blot analysis 

Frozen heart tissue (100 mg) was homogenized in lysis-binding 
buffer (Dynabeads mRNA Direct kit, Dynal) and spun at 
13000xg for 2 min. The protein pellet was dissolved in 
reducing SDS-sample buffer for Western blot analysis. Proteins 
were separated by SDS/PAGE on 10% acrylamide gels and 
transferred electrophoretically to Hybond-P membranes 
(Amersham, Aylesbury, UK). Membranes were blocked with 
5% nonfat skim milk powder in Tris/NaCI (10 mM Tris/HCI, 
pH 7.0, 150 mM NaCI), incubated with affinity purified anti- 
peptide antibody, then with horseradish peroxidase conjugated 
sheep anti-(rabbit Ig) secondary antibody, and visualized by 
enhanced chemiluminescence (Amersham, Aylesbury, UK). 

Immunohistochemistry 

Paraffin sections (5 |xm) of formalin-fixed human heart were 
deparaffinized, then rehydrated before antigen retrieval in 
boiling 10 mM citric acid buffer, pH 6. After cooling, 
endogenous peroxidase activity was inhibited by lOmin 
incubation in 1% hydrogen peroxide. Non-specific antibody 
binding was blocked by incubating the sections in 4% nonfat 
skim milk powder in NaCl/Pj for 15 min, followed by 10% 
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Fig. 1. Corin expression in human and 
mouse tissues. (A) Northern blot analysis of 
RNA isolated from a range of normal human 
tissues probed with ^^P-labelled corin cDNA. 
The levels of p-actin mRNA are shown as a 
control for loading. (B) Northern blot analysis 
of corin mRNA expression in a range of mouse 
tissues probed with **^P- label led human corin 
cDNA at reduced stringency. The levels of 
18S ribosomal RNA are shown as a control 
for loading. 
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normal goat serum for 20min. Affinity purified anticorin Al 



(1 : 100; 150 »jig-mL"*) or A2 antibodies (1 : 50; 
20 |xg*mL~') were applied and incubated overnight in a 
humidified chamber at room temperature. Controls included 
sections incubated with no primary antibody or antibody that 
had been preadsorbed for 2 h at room temperature with 1 pig of 
the antigenic peptide. Following incubation with prediluted 
biotinylated goat anti-(rabbit Ig) Ig (Zymed, San Francisco, 
CA, USA), streptavidin— horseradish peroxidase (Zymed) was 
applied and color developed using the chromogen 3,3'-diamino- 
benzidine with hydrogen peroxide as substrate. The sections 
were counterstained in Mayers' haematoxylin. 



RESULTS AND DISCUSSION 

Isolation of human corin cDNA by homology cloning 

A PCR-based homology cloning approach was employed to 
identify serine protease cDNAs expressed by the Sla cell line 
[25] which is resistant to tumor necrosis factor-a induced 
apoptosis. Degenerate primers designed to anneal to cDNA 
encoding the conserved regions surrounding the catalytic 
histidine and serine amino acids of serine proteases [21-23], 
were used to amplify and then clone a range of DNA fragments 
of approximately 450 bp. One clone, designated ATC2, was 
found to encode a novel serine protease. The cDNA was 
extended in the 5' and 3' directions by library screening and the 
DNA sequence was deposited in the DDBJ/Genbank/EMBL 
database (accession no. AF 113248). This sequence was 
subsequently determined to be 100% identical to a recently 
reported cDNA encoding the serine protease, corin (accession 
no. AF133845) [19]. 



Corin mRNA is strongly expressed in heart 

The tissue distribution of corin mRNA was examined by 
Northern blot analyses. Analysis of poly(A)'^ RNA from 16 



normal human tissues showed a single transcript of approxi- 
mately 5.1kb detectable only in human heart (Fig. lA). 
Examination of a range of mouse tissues also demonstrated 
specific expression of corin mRNA of approximately 5.1kb 
only in mouse heart (Fig. IB). 



Corin - 




Fig. 2, Corin protein expression in human heart tissue by Western blot 
analysis. Immunoreactive corin protein of 125-135 kDa is detected in a 
protein lysale prepared from human heart tissue (Patient #7684), which is 
not detectable in a corin negative HeL^ cell lysate. The blot was probed 
with anticorin antibody, AbAl, and visualized using enhanced chemilumi- 
nescence. The protein standards in kDa are as indicated. 
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Fig. 3. Corin is localized to human heart myocytes by immunostaining. Immunohistochemical staining of human heart tissues was perfomied using the 
affinity purified anticorin peptide Al or A2 polyclonal antibodies as primary antibodies. (A) a longitudinal section of a representative heart tissue from a 
transplant recipient (Patient #7684) stained with AbAl showing intense staining in the cardiac myocytes; (B) as (A) except the primary antibody was 
preadsorbed with the immunogenic peptide, Al, for 2 h; (C) the same tissue as (A) except stained with the weaker staining antibody, AbA2. Apparent 
staining at the poles of the nuclei are deposits of the brown lipochrome pigment, lipofuscin. (D) the same tissue as (A-C) processed in the absence of primary 
antibody; (E) a longitudinal section of normal myocardium from a heart which contained an acute infarct elsewhere (Patient #A4-99R) stained with AbAl 
showing intense staining corresponding to the cross striations; (F) staining of the same heart tissue as (E) with AbAl showing intense staining in cross 
section. Photomicrographs (A— E) were taken at an original magnification of lOOx. 



Anti-corin antibodies detect corin in heart lysates 

We generated polyclonal antibodies to two different peptides 
derived from unique regions of the corin polypeptide 
sequence in order to investigate its expression and localization 
in the heart. The first was a unique region within the serine 
protease catalytic domain between the conserved Asp and Ser 



amino-acid residues (AbAl) and the second was contained 
within the scavenger receptor domain (AbA2). Immunoblot 
analysis of corin protein expression in human heart protein 
lysates showed a major immunoreactive band of 125-135 kDa 
(Fig. 2), which was not present in lysates from the negative 
control HeLa cell line. This molecular mass is slightly lower 
than that reported («= 150 kDa) for recombinant V5/His6 
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Fig, 4. Corin expression in neonate heart with TAPVR. Immunohistochemical staining of human neonate heart tissues was performed using the affinity 
purified anticorin peptide Al polyclonal antibody as the primary antibody. (A) and (C) longitudinal sections of TAPVR heart tissue showing staining in the 
cardiac myocytes, corresponding to the cross striations; (B) and (D) longitudinal sections of a normal neonate heart showing a similar staining pattern in the 
cardiac myocytes. Photomicrographs (A) and (B) were taken at an original magnification of lOOx and (C) and (D) were taken at an original magnification of 
40x. 



tagged corin expressed by human embryonic kidney 293 cells 
[20]. As the mature corin zymogen has a calculated mass of 
116kDa [19], it is likely lhai the mature corin polypeptide 
undergoes a post-translational processing event, possibly 
glycosylation. Consistent with this, there are 19 predicted 
N-linked glycosylation sites present in the extracellular 
domains of corin [19]. 



Corin is expressed by human heart myocytes 

To investigate the localization of corin expression in human 
heart, immunohistochemical analyses were performed on 
human adult heart tissues. Corin was abundantly expressed 
in cardiac myocytes, with intense brown staining associated 
with cross striations seen in longitudinally sectioned myofibers 
(Fig. 3A). In some areas there was accentuation of the plasma 
membrane, consistent with an integral membrane localization 
of corin. This same pattern of staining was observed in sections 
taken from all areas of the myocardium. Control slides using 
the AbAl polyclonal antibody in the presence of competing 
Al peptide showed absence of this specific staining pattern 
(Fig. 3B). An identical, albeit weaker staining pattern was 
observed in experiments performed using the second corin- 
specific antibody (AbA2) (Fig. 3C). No staining was detected 
in the absence of antibody (Fig. 3D). Staining of a section of 



viable myocardium from a heart containing an acute myocar- 
dial infarct showed a similar intense staining of the striations 
in cardiac myocytes (Fig, 3E) and a pinhead-like dot pattern 
when viewed in cross section (Fig. 3F). Necrotic heart tissue 
showed similar but much less intense staining (data not shown). 
Corin was not detected in sections of skeletal or smooth muscle 
(data not shown), suggesting that the function of corin is 
specifically related to cardiac muscle. 



Corin protein expression in a patient with the congenita! 
heart disease, TAPVR 

The molecular mechanisms responsible for the developmental 
defect associated with the rare congenital heart disease TAPVR 
are not known. The location of the corin gene on human 
chromosome 4pl2-13 [19] and the localization of the TAPVR 
locus to a 30 centimorgan interval on 4pl3-ql2 [26], suggested 
that corin may be a candidate for the TAPVR gene [19]. If corin 
plays a role in TAPVR, its expression may be lost or altered in 
TAPVR heart tissue. To explore this possibility, we examined 
corin protein expression in a TAPVR heart. The pattern of corin 
expression detected in this heart tissue (Fig. 4A,C) was similar 
to that observed in the adult heart and was identical to the 
pattern of corin staining in an age-matched neonate control 
heart (Fig. 4B,D). While this data is not consistent with a role 
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Fig. 5. Diagram showing domain structures of corin compared witti other mosaic integral membrane proteins. The domains are as indicated. The 
catalytic serine protease residues are circled. The disulfide bond linking catalytic and pro-regions are marked. 



for corin in TAPVR, it does not exclude the possibility that 
TAPVR is associated with more subtle alterations to the corin 
gene; for example point mutations, that would not be detected 
by this method. 

Corin homology to other type II transmembrane proteases 

As illustrated in Fig. 5, corin is a mosaic integral membrane 
protein possessing discrete domains. The intracellular, cyto- 
plasmic domain contains two potential protein kinase C phos- 
phorylation sites which may represent mechanisms for signal 
relay to or from the cell surface. Corin contains two frizzled 
domains. These domains function in other molecules as 
receptors for Wnl proteins, which are implicated in signal 
transduction during development [28]. Corin possesses eight 
LDL receptor domains which can mediate uptake of LDLs [29] 
and have also been shown to be involved in binding and 
intemalization of protease/inhibitor complexes [30], LDLs 
regulate the transport of cholesterol and play a major role in 
the development of heart disease. Corin possesses a scavenger 
receptor domain, which in other proteins, binds polyanionic 
molecules including modified lipoproteins, cell surface lipids 
and some sulfated polysaccharides [31]. The trypsin-like serine 
protease domain is located at the C-terminus. 

Corin bears similarity to other known members of the 
integral membrane serine proteases as illustrated in Fig. 5. The 
corin serine protease domain is highly homologous to a 
multidomain integral-membrane serine protease found in the 
brush border of the intestine, enteropeptidase [32]. Entero- 
peptidase functions to activate digestive pancreatic enzymes 
released from the intestine. Activation of this cascade is critical, 
as illustrated by the life-threatening intestinal malabsorption 
that accompanies congenital deficiency of enteropeptidase [32]. 
Other proteases with homology to the corin serine protease 
domain are the integral-membrane serine proteases, TMPRSS2 
and hepsin. Hepsin is a hepatic serine protease that has been 
demonstrated to activate Factor VII in the extrinsic blood 
coagulation pathway leading to thrombin formation, and has 
further been shown to be required for mammalian cell growth 
[33]. 

In summary, we have confirmed heart as a site of abundant 
corin mRNA expression and demonstrated for the first time the 
expression of corin as a 125-135 kDa protein in this tissue. In 



addition, in heart we have localized corin protein to myocytes; 
the same cardiac cells expressing pro-ANP. These data support 
recently reported in vitro evidence that the corin proteolytic 
domain is the pro-ANP convertase [20] and thus, the proposal 
that corin has a role in regulating blood pressure. Possible 
additional functions of the serine protease domain and the 
functions of the other corin domains are not yet known. The 
putative phosphorylation sites in the cytoplasmic domain of 
corin may indicate that the intracellular domain of corin will be 
a target for phosphorylation and therefore may mediate 
signalling events from the cell surface. A belter understanding 
of the role of corin in heart will provide insight into basic 
molecular mechanisms of cardiac function and could provide a 
rational target for both diagnostic and therapeutic applications. 
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Cell surface proteolysis has emerged as an important mecha- 
nism for the generation of biologically active proteins that mediate 
a diverse range of cellular functions. The proteolytic activities of 
membrane- anchored proteins, such as ADAMs^ (1) and MT-MMPs 
(2), are thought to play central roles in cell surface-activating 
events. In contrast, most of the members of the serine protease 
family, one of the oldest characterized and largest raultigene pro- 
teolytic families, are either secreted enzymes or sequestered in 
cytoplasmic storage organelles awaiting signal-regulated release. 
These serine proteases have well characterized roles in diverse 
cellular activities, including blood coagulation, wound healing, di- 
gestion, and immune responses, as well as tumor invasion and 
metastasis. However, during the last few years there has been an 
explosion in the identification of transmembrane proteins contain- 
ing C- terminal extracellular serine protease domains. These en- 
zymes are ideally positioned to interact with other proteins on the 
cell surface as well as soluble proteina, matrix components, and 
proteins on adjacent cells. In addition, these membrane-spanning 
proteases have cytoplasmic N-terminal domains, suggesting possi- 
ble functions in intracellular signal transduction. This review de- 
lineates for the first time this emerging class of cell surface pro- 
teolytic enzymes, the type II transmembrane serine proteases 
(TTSPs), to highlight their structural features, expression profiles, 
and possible roles in mediating cell surface proteolytic events. 

Structural Features ofTTSPa 

In mammals the TTSPs currently consist of 17 members (Table 
I), of which seven are found in man. Enteropeptidase (also known 
as enterokinase) (3), because nf its essential role in the processing 
of digestive proteases, was the first member of this group to be 
discovered nearly a century ago. The other more recently identified 
members include hepsin (4), human airway trypsin-like protease 
(HAT) (5), corin (6), MT-SPl (7) (also known as matriptase (8)), 
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tor; uPAR, uPA receptor. 



TMPRSS2 (9), and most recently TMPRSS4^ (10). The only non- 
mammalian TTSP identified to date is the Drosophila protease 
stubble-stubbloid (st-sb) (11). Mammalian orthologues have been 
reported for enteropeptidase (mouse (12), rat (13), cow (14), and pig 
(15)), hepsin (mouse (16) and rat (17)), corin (mouse, also known as 
LRP4 (18)), MT-SPl (mouse, also known as epithin (19)), and 
TMPRSS2 (mouse, also known as epitheliasin (20)) (Table I). The 
TTSPs share a number of common stnxctural features including (i) 
a proteolytic domain, (ii) a transmembrane domain, (iii) a short 
cytoplasmic domain, and (iv) a variable length stem region contain- 
ing modular structural domains, which links the transmembrane 
and catalytic domains (Fig. 1). It is this unique combination of 
domains that suggests novel roles for the TTSPs at the cell surface. 

Proteolytic Domains — As is the case for the wider family of 
enzymes of the chymotrypsin (Si) fold,^ the proteolytic domains of 
the TTSPs share a high degree of amino acid sequence identity. In 
particular, the his ti dine, aspartate, and serine residues necessary 
for catalytic activity are present in highly conserved motifs. TTSPs 
are synthesized as single chain zymogens and are likely activated 
by cleavage following an arginine or lysine present in a highly 
conserved activation motif. Based on the predicted presence of a 
conserved disulfide bond linking the pro- and catalytic domains 
(Fig. 1), the TTSPs are likely to remain membrane-bound following 
activation. However, the isolation of soluble forms of enteropepti- 
dase (21, 22), HAT (23), and MT-SPl (24) suggests that the extra- 
cellular domains of at least some of the TTSPs may also be shed 
from the cell surface. Other cysteine residues conserved among the 
TTSPs include six cysteines predicted to form three intraprotease 
domain disulfide bonds. Enteropeptidase and hepsin each have one 
and corin has two additional predicted disulfide linkages within tlie 
catalytic domain. The presence of an aspartate six residues before 
the catalytic serine, which in the activated TTSP would be posi- 
tioned at the bottom of the Si substrate binding pocket, is indica- 
tive that all of the TTSPs have preference for substrates containing 
an arginine or lysine in the PI amino acid position (SI and PI 
designations are described (25)). The cleavage specificities and 
candidate physiological substrates for some of the TTSPs have been 
elucidated. The predicted cleavage specificity following basic amino 
acids indicates that the TTSPs are likely to have a degree of 
autocatalytic activity. Indeed truncated mouse hepsin lacking cy- 
toplasmic and transmembrane domains (16) and the human MT- 
SPl proteolytic domain (7) are capable of autoactivation. In con- 
trast, bovine enteropeptidase has extremely low autocatalytic 
activity (26), Interestingly, the proteolytic domain of bovine en- 
teropeptidase has an additional role in the targeting of enteropep- 
tidase to the apical membrane of enterocytes (27). 

Transmembrane Domains — Each of the TTSPs contains a hydro- 
phobic domain near the N terminus. This domain is predicted to 
span the plasma membrane in such a way that the proteolytic 
domain lies extracellularly, presumably to localize TTSP proteo- 
lytic activity in close proximity to tariget substrates and/or to per- 
mit regulated release of the protein from the cell surface. Cell 
surface localization has been experimentfilly demonstrated for en- 
teropeptidase, hepsin (28, 29), MT-SPl (30, 31), TMPRSS2 (20), 
and TMPRSS3 (10). 

Cytoplasmic Domains — The cytoplasmic domains of the TTSPs 
(Fig. 1) range in length from 12 amino acids for HAT to 112 amino 
acids for murine corin. Whether these domains have the potential 
to support interactions with cytoskeletal components and signaling 
molecules is not yet known. However, a number of the TTSPs 
including corin, MT-SPl, st^sb, and TMPRSS2 contain consensus 



"Originally designated TMPRSS3 (10). Tho Human Genome Nomanda- 
turo C omm ittee- approved symbol TMPRSS3 has been allof:ated to a pre- 
dicted TTSP-encooing geno located on chromosome 21q22.3 (66). The amino 
acid sequence of the TMPRSS3 protein has not been reported. 

^ Information on the classification and nomenclature of the SI family of 
peptidases can be found in the Intemet-accossible MEROPS data base. 
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Table I 

Summary of type II transmembrane serine proteases 
The abbreviations used are: b, brain; bl, bladder; bp, Drosophila S6-h pupae; c, colon; de, Drosophila 12-l8-h embryo; dp, DrosophUa earty 
prepupae; e, esophagus; h, heart; int, intestine; k, kidney; 1, lung; le, leukocytes; 11, Uven P. pancreas; pi, placenta; pr, prostate; psi, proximal small 
intestine (si); a. spleen; st, stomach; t, testes; th, thymus; tr, trachea. 
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phosphorylation sites for either or both of protein kinase C and 
casein kinase 11. In addition, based on the cellular sorting of other 
integral membrane proteins (32) it is likely that the cytoplafimic 
and transmembrane domains also contribute to the targeting of the 
TTSPs to a particular cell surface in polarized cells. 

Stem Regions — The stem regions of the TTSPs contain as many as 
11 structural domains that may serve as regulatory and/or binding 
domains (Fig. 1). These include low density lipoprotein (LDL) recep- 
tor class A domains. Group A scavenger receptor (SR) domains » 
frizzled domains, CWClr, iirchin embryonic growth factor and ^ne 
morphogenic protein 1 (CUB) domains, sea urchin sperm protein, 
gnterokinase, agrin (SEA) domains, a meprin, A5 antigen, and recep- 
tor protein phosphatase (MAM) domain, and a disulfide knotted 
domain. Hepsin is the only TTSP that does not possess an identified 
structural domain within its stem region. Although functional roles 
for individual stem region domains have not been demonstrated, the 
stem region of bovine enteropeptidase has been shown to be required 
for efiicient cleavage of its physiological substrate trypsinogen (26). 
In addition, the N terminus of the stem r^on of Uiis protein is 
required for delivery of enteropeptidase to the apical surface of po- 
larized Madin-Darby canine kidney cells (27). 

The most common stem re^on structural domain is the LDL 
receptor class A domain: corin contains eight, MT-SPl four, en- 
teropeptidase two, and TMPRSS2 and TMPRSS4 one each (Fig. 1). 
Although the function of these domains in the TTSPs has not been 
demonstrated, in other proteins they bind Ca''^'^ ions and mediate the 
internalization of macromolecules including serine protease-inhibitor 
complexes and lipoproteins (33-35). In addition, although LDL re- 
ceptor domains also function in the uptake of LDLs, increased LDL 
uptake could not be demonstrated following expression of murine 
corin in COS cells (18). 

Six other structural domains that are thought to be involved in 
protein>protein interactions or protein-ligand interactions are 
found in various TTSPs. SR domains (36) are present in corin, 
enteropeptidase, TMPRSS2, and TMPRSS3; frizzled domains (37) 
are present in corin; CUB domains (38) are present in enteropep- 
tidase and MT-SPl; SEA domains (39) are present in HAT and 
enteropeptidase; a MAM domain (40) is present in enteropeptidase; 
and a disulfide knotted domain (41) is present in st-sb (Fig. I). In 
addition to these structural domains, human and mouse MT-SPl s 
possess a conserved RGD motif (42) present in the first CUB 
domain. Interestingly, truncated human MT-SPl lacking cytoplas- 
mic and transmembreuie domains remains bound to the cell surface 
of COS cells (31). Binding may be mediated via an interaction 
between the MT-SPl RGD motif and an integrin protein or another 



cell surface proteirL Alternatively, the mode of attachment could be 
via a direct link such as a hydrocarbon chain. 

Tissue Expression of TTSPs 

Although a few of the TTSPs are expressed across several tissue 
and cell types, in general these enzymes demonstrate relatively 
restricted expression patterns, indicating that they may have tis- 
sue-specific fiinctiona (Table 1). Enteropeptidase shows a very nar- 
row expression pattern, being restricted in normal tissues to en- 
terocytes of the proximal small intestine (12). Corin expression is 
also quite specific, with corin mRNA highly expressed in human 
heart (6) and corin protein expression localized to cardiac myocytes 
(43). HAT is predominantly expressed in trachea (5, 23). Human 
TMPRSS2 expression is predominantly associated with prostate (9, 
44).'* Hepsin, originally identified from liver, is highly expressed in 
fetal liver and kidney (45). Hepsin mRNA has been reported to be 
overexpressed by ovarian tumors (46), and protein expression has 
been localized to tumor cell membranes in renal cell carcinoma 
(29). TMPRSS4 has only recently been characterized and was iden- 
tified as a consequence of its strong up-regulation in pancreatic 
tumors (10). While TMPRSS4 was not detected in normal pancreas, 
very low level TMPRSS4 mRNA expression was detected in tissues 
of the gastrointestinal tract and in some tissues of the urogenital 
tract (10). MT-SPl was originally identified fi^ra a human breast 
cancer line (30) but shows the broadest pattern of expression of the 
TTSPs being detected in a wide range of both human (7) and 
murine tissues (19). 

Biochemical Data and Pathophysiological Roles 
The majority of the TTSPs have been identified relatively re- 
cently and consequently have not been extensively characterized. 
Enteropeptidase is somewhat of an exception. Although the enzy- 
matic activity ascribed to enteropeptidase was first identified al- 
most a century ago (47) it has been only recently that the complete 
amino acid sequence was described (3). Enteropeptidase fxmctions 
near the apex of the digestive enzymatic cascade activating the 
digestive protease trypsinogen to trypsin^ which stibsequently ac- 
tivates other enzymes includiivg chyraotrypsinogen, proelastase, 
proU pases, and procarboxypeptidases. Enteropeptidase possesses 
extremely low autocatalytic activity, and it has been proposed that 
the serine protease duodenase, secreted by duodenal epitheliocytes, 
may be its physiological activator (48). Active enteropeptidase con- 



■* The Northern blot data reported (9) are incorrectly labeled due to inver- 
sion of the membranes (Stylianos Antonarakis. personal oommunicatioii). 
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Fig. 1. Type II transmembrane serine protease domain struc- 
ture. Structures, listed by Length, are of the seven human TTSPs and 
the Drosophila TTSP st-sb. The amino acid (aa) sequence of each 
protein was scanned using the ProfileScan algorithm to confirm the 
presence of each domain. Numbers delineate the location of each 
domain. 



sists of heavy and light chains that are extensively glycosylated 
(27, 49). It has recently been reported that physiological concen- 
trations of pancreatic trypsin activate protease- activated receptor 
(PAR) 2 at the apical membrane of entenocytes (50). PAR2 is a 
member of the PAR family of signal-transducing, G protein-cou- 
pled, plasma membrane-spanning receptors, which are activated 
by the proteolytic action of select serine proteases (51, 52). These 
data and the observation that an exosite in the heavy chain of 
enteropeptidase is required for efHcient recognition of trypsinogen 
(26) suggest that enteropeptidase may play a role in facilitating 
trypsin- mediated PARS activation on enterocytes. Thus enteropep- 
tidase may localize trypsinogen/trypsin at the membrane of entero- 
cytes, initiating a limited proteolytic cascade at the cell surface in 
close proximity to the trypsin cleavage target PAR2, thereby facil- 
itating receptor activation and signal transduction. 

Hepsin is a glycoprotein originally cloned frtjra human liver and 
hepatoma cell lines and, more recently, implicated in mammalian 
cell growth and morphology (53), tumor progression (28), and de- 
velopmental processes, such as blastocyst hatching (16). The im- 
portance of hepsin in uiuo, however, remains unclear as homozy- 
gous hepsin null mice are pheno typically normal (54). An as yet 
unexplained pheno type of the hepsin — /— mice is a 2-fold higher 
serum concentration of bone-derived alkaline phosphatase com- 
pared with wild type mice (55). 

The human airway TTSP, HAT, was originally purified as a 
soluble protein from the sputum of patients with chronic airway 
diseases. Full-length HAT is synthesized, translocated to the cell 
surface where it is processed to a soluble form, and then released 



from tracheal seruus glands as part of the host immune defense 
system (5). 

Significantly, the human heart TTSP, corin, is an in vitro acti- 
vator of pro-atrial natriuretic peptide (ANP), a cardiac hormone 
essentia] for the regulation of blood pressure (56), suggesting that 
corin is the long sought pro-ANP convertase, 'Hiis proteolytic cleav- 
age is critical for the regulation of ANP activity (57); thus, corin 
may well prove to be an important factor in the regulation of major 
cardiovascular diseases. Dysfunctional corin was proposed to be a 
candidate for the rare congenital heart disease, total anomalous 
pulmonary venous return (TAPVR), as the corin gene colocalizes to 
the TAPVR loc\is on human chromosome 4pl2— 13 (6). In addition 
to heart, murine corin is expressed by chondrocytes in a differen- 
tiation stage-specific manner during mouse development, suggest- 
ing that this protease may play a role during chondrocyte differ- 
entiation/bone formation (6). However, while human and murine 
corin share high homology, common structural features, expression 
profiles, and syntenic chromosomal locations, these proteases are 
variant in the lengths of their cytoplasmic domains (45 residues in 
human and 112 in mouse) and show no conservation in amino acid 
sequence in this domain. This may indicate that murine and hu- 
man corin have different but perhaps overlapping species-specific 
roles, or alternatively the C3^plaamic domain is not essential for 
corin functions. 

In other significant recent experiments it has been shown that 
MT-SPl may be involved in initiating signaling and proteolytic 
cascades via the activation of the cell surface-associated proteins 
PAR2 and pro-uPA(31), Interestingly, MT-SPl from breast cancer 
cells is detected largely as an uncomplexed protein, whereas in 
milk it is present mainly as a complex with the Kunitz-type serine 
protease inhibitor hepatocyte growth factor inhibitor-l (24). It will 
be important to identify the inhibitor binding domains of MT-SPl 
and the function of the protease*inhibitor complex. 

TMPRSS2 and TMPRSS4 have been identified through associa- 
tion with cancer. TMPRSS2 is thought to play a role in epithelial 
cell biology, and its association with prostate carcinogenesis has led 
to the proposal that it may be a diagnostic or therapeutic taiiget for 
prostate cancer (44). TMPRSS2 has been proposed to be part of an 
enzymatic cascade involving the serine proteases prostate-specific 
antigen and human kallikrein K2 in a manner analogous to the 
fibrinolytic and blood coagulation cascades (44). TMPRSS4 is over- 
expressed in pancreatic cancers; however, its functional signifi- 
cance remains unclear (10). 

The Drosophila serine protease st-sb is one of a number of 
proteases involved in fly morphogenesis (11) and has a proteolytic 
function in detaching imaginal disks from extracellular matrices. In 
addition, the phenotype of s1>8b mutants has led to speculation that 
the encoded protein is involved in outside to inside signal transduc- 
tion via its cytoplasmic domain, thus resulting in cytoskeletal reor- 
ganization and changes in cell shape during morphogenesis (11). 

Analogous Membrarie'ixssociated Proteolytic Systems 
In contrast to the traditional protein catabolic functions of many 
of the secreted members of the serine protease family and based on 
the presence of multiple structural domains in the TTSPs, it is 
tantalizing to speculate that the TTSPs function as key regulators 
of signaling events at the plasma membrane. Precedents for such 
functions come trom other more well characterized membrane- 
assodated proteolytic systems such aa the ADAMs (1), the MT- 
MMPa (2), and the uPA-uPA receptor system (58). 

The ADAMs have recognized and proposed roles in the proteol- 
ysis of extracellular matrix (ECM) components and cell surface 
proteins, in mediating cell adhesion via integrin binding, in cell 
fusion and signaling via interactions of their cytoplasmic domains, 
and in RGD-mediated interactions with integrins (59—61). The 
TTSPs are similarly positioned at the plasma membrane to release 
ECM components and to proteolytically activate cell surface proteins 
such as PARs, growth factora, and cytokines, and to interact with cell 
surface and soluble ligands. In addition, the presence of the cytoplas- 
mic domains indicates that the TTSPs may be capable of interacting 
with the cytoskeleton and/or with cellular signaling molecules. 

The MT-MMPs function in pericellular cascades to activate other 
MMPs involved in the cleavage of ECM components. The TTSPs may 
well perform similar functions in activating proteolytic cascades on 
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the plasma menibrane. Indeed, this functian has been demonstrated 
for enteropeptidase in the activation of digestive proteases. Mareover, 
there is increasing evidence for cross-talk between proteolytic sys- 
tems. The uPAniPA receptor system of cell surface-localized proteo- 
lytic activity has a recognized role in the initial stage of MMP acti- 
vation (62)> and other serine proteases are also capable of in vitro 
MMP activation (63, 64). The TTSPs could play a direct role in MMP 
activation or an indirect role in localizing and activating other serine 
proteases more directly associated with MMP activation. The activa- 
tion of uPA by MT-SPl (31) and subsequent downstream MMP 
activation could be an example of such cross- talk. 

Several other parallels may also be drawn from the uPA*uPA 
receptor system. That the TTSPs are direcUy anchored to the 
plasma membrane implies that they have potential to mimic local- 
ization of the uPA-uPAR system to the leading edge of migrating 
tumor cells (65). Further, the interaction of the uPA-uPAR system, 
via a nonproteolytic mechanism, in mediating cell-cell contacts 
through association with integrins may also parallel TTSP proper- 
ties. Indeed the multidomain structure of the TTSPs indicates their 
capacity to interact with multiple partners and suggests the pos- 
sibility that these membrane proteins may form part of a signalo- 
some-like complex, thereby mediating at the cell sxirface multiple 
signaling pathways as is the case for the uPA-uPAR system (58). 

Concluiling Remcwka 

What is known about the TTSPs is that they function or have the 
structural motifs necessary to function as serine proteases. What can 
be speculated upon is that their numerous and varied nonproteolytic 
domains are likely to mediate interactions with proteolytic substrates 
and inhibitors as well as other proteins and ligands. Such interac- 
tions will potentially regulate the proteolytic activity of the catalytic 
domain but perhaps may also have functions quite independent of 
this domain. Furthermore, given the integral plasma membrane 
nature of the TTSPs, it is tempting to speculate that at least some of 
the TTSPs will fimction directly in transducing signals across the 
plasma membrane, as has been sxiggested for the Drosophila TTSP 
strsb (11). There is clearly a need for a greater understanding of the 
biology and physiological functions of this group of unique proteases 
to obtain a better picture of the dynamics occurring on the cell 
surface. Because of the mosaic structure of the TTSPs it will be 
important to understand the role of their individual domains as well 
as the role of each protein in toto. 

Note Added in Proof— Two cDNAs encoding the putative TTSPs Xeap-2 
and XMT-SPl have recently been identified from Xenopus laevis (67). 
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'^{^ract We report the isolation of a cDNA encoding a novel 
90iirfaie serine proteinase, epitheliasin. The cDNA spans ]753 bp 
^'thf) encodes a mosaic protein with a calculated molecular mass or 
£j3529 Da. Its domains- include a cytoplasmic tail, a t^-pe II 
^I^Qsmembrane domain, a low-density lipoprotein receptor class 
^•j^ domain, a cysteine rich sca%*enger receptor-like domain and a 
Sienne proteinase domain. The proteinase portion domain shows 
f4$S^yo identity with mouse neurotrypsin, acrosin, hepsin and 
Scbteropeptidase. The gene, located in the telomeric region in the 
^iong arm of mouse chromosome 16, consists of 14 exons and 13 
^bitrons and spans approximately 18 kb. Epitheliasin is expressed 
^.primarily in the apical surfaces of renal tubular and airway 
IjrpUhelial cells. 

i'-(E> 2000 Federation of European Biochemical Societies. 
f Key words: Serine proteinase: Mosaic protein; Epitheliasin 



J.- Introduction 



Proteinases are implicated in a wide spectrum of physio- 
ylogic and pathophysiological processes in the kidney. Renin, 
/a proteinase synthesized in renal cortical cells plays a major 
^role in the regulation of blood pressure and electrolyte bal- 
iance by converting angiotensinogen lo angiotensin 1. Further- 
2more, the renal kallikrein-kinin system activated under con- 
^"dilions of mineralocorticoid excess represents a compensatory 
yfesponse against the development of hypertension and renal 
.yinjury induced by salt excess. Proteolytic enzymes also have 
♦been ascribed important roles in both Icukocyie-depcndcnt 
trand independent models of glomerular diseases (reviewed in 
llM)- Recently. Vallei and colleagues identified a novel serine 
sproteinase from Xenopiis lacvis kidney epithelial cells. CAP 1, 
^involved in activation of the epithelial sodium channel. EnaC 
^J2J. This was the first report of channel activating activity of 
^ endogenous proteinase. 

Srin the present report, wc describe a novel serine proteinase 
j^Sl ^P^'^ss^'l in murine renal epithelial cells with sequence homol- 
'^*^liP8>* to CAPI 



The enzyme, that we term epitheliasin. is a 
|5?odular protein consisting of five sequence motifs, a cytoplas- 



* 




1^*^ tail, a type II transmembrane (TM) domain, a low-density 
>proiein receptor class A (LDLRA)-!ike domain, a cysteine 
scavenger receptor-like (SRCR) domain and a serine pro- 
^ioase domain. The sequence and structural features of epi- 
gP^liasin cDNA and gene, its chromosomal localization and 
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tissue expression are described. Epitheliasin has sequence 
identity to a human cDNA recently cloned by exon trapping 
named TMPRSS2 [3]. However, the tissue distribution of epi- 
theliasin and TMPRSS2 is strikingly different. 

2. Materials and methods 

2. 1. Materials 

Multiple tissue Northern blots. ExpressHyb hybridization solution, 
rapid amplification of cDNA ends (RACE) ready cDNAs from mouse 
kidneys and Marathon cDN.\ kits were from CLONTECH (Palo 
Alto, CA. USA). TA cloning kits were from Invitrogen (Carlsbad, 
CA. USA). LA PCR kits we're from Panvera (Madison. Wl. USA). 
Klenow DNA polymerase. (ot-'-PJdCTP (3000 Ci/mmol) and ly-"Pj- 
dATP (3000 Ci/mmol) were from Amersham Life Science (Arlington 
Heights. IL, USA). BUPHQ Tris-glycine SDS. Tris-glycinc and Im- 
munogen Conjugation kits were from Pierce (Rockford. IL. USA). 
Alkaline phosphatase conjugiued goat anti-rabbit antibody was 
from Zymed (San Francisco. CA. USA). BCIP/NBT tablets were 
from Sigma (St Louis. MO. USA). Citra solution and VIP substrate 
were from Vector Laboratories (Burlingame. CA. USA). Blocking 
reagent. SA-HRP and bioiiny) tyramide were supplied by NEN Life 
Science Products (Boston. M.A. USA). 

2.2. hivntification and cloning of epitlwltasin cD\A- 

A conserved sequence around the serine active site residue 
(GGIDSCQGDSGGPLVC) was used to search the mouse EST data- 
base using TBLASTn. Of the 100 ESTs initially identified, a novel 
EST (ubSSgOl.sl) containino 3S9 nt and its mirror sequence 
(ub5Sg0l.rl} were further analyzed using the non-redundant data- 
bases. BLASTn and BLASTv. Four overlapping sequences were 
found from these searches, one was from a kidney library 
(ucSlcl l.yl). two from a mammary gland library (vfB6g09.rl. 
ve37cl2.rl). and one from a blastocyst library (vI64c03.rl ). 

To obtain the full-length cDNA of interest the RACE strategy was 
employed. Initially. LA PCR was utilized to amplify mouse kidney 
cDNA employing a sense, primer (5'."-^CCATACTCAACTCCTC- 
ATGCTGCT"'''-3') designed based on the novel sequence and an 
anchor primer. API. The initial PCR product was subjected to nested 
PCR using a sense (5'--'"*CTGACACAGCCAGGATGGCATTG''- 
3') and an anti-sense primer ( 5'- ''*"GTGG ATT AGCTGTTCG CC- 
CTCATT''*'*-3'). This nested reaction amplified a 1.5 kb product 
that was ligaied into the pCR^?. I vector and sequenced using an 
ABI automatic sequencer. 

To obtain the 3' end, mouse kidney cDNA was subjected to 3'- 
RACE. The cDNA was amplified using API and a sense primer (5'- 
-^CCATACTGAACTCCTCATGCTGCT-"-3'). The product was 
diluted (1:50) and a nested PCR amplification was performed using 
a second anchor primer. AP2. and a sense primer (5'- 
-'*CTGACACA- GGCAGGATGGCATTG'-3'). The 2 kb PCR 
product obtained was cloned and sequenced as described abo\'e. 

2.3. Genomic cloning and analysis 

To obtain the epitheliasin gene, a mouse genomic bacterial artificial 
chromosome (BAC) librar>' (Genome Systems. St Louis. MO. USA) 
was screened using a 0.7 kb probe extending from 831 to 1477 nt of 
mouse epitheliasin cDNA. A single done (BAC-24) was identified and 
confirmed by sequencing to contain the entire epitheliasin gene. To 
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identify ihc iniron junction borders, DNA from BAC-24 was directly 
sequenced using oligonucleoiide primers defined mitiaUy by the cDNA 
sequences and subsequently by derived sequences. Soulhem analysis 
was used lo determine the siie of the epitheltasin gene. 

2 4. Chronwsomal assignment 

The plasmid clone (BAC-24) obtained from the genomic library was 
used as a probe for chromosomal localization by fluorescence in situ 
hybridization (FISH). The probe was nick translation -labeled with 
bioiin. hybridized to meiaphase chromosomes and delected with 
Cy-3-conjugated strepiavidin. Chromosome spreads were prepared 
by standard procedures and G-banded after trypsin treatment and 
Wright's staining. Hybridization and detection conditions on meia- 
phase chromosomes were performed as previously described (4). Probe 
signals were delected with the Cy3 conjugate viewed using an epifiuor- 
esccnce microscope. The fluorescence image was overlaid on the G- 
bandcd image to localize the gene. 

2.5. Northern blot analysis 

Mouse multi-tissue blols containing 2 of poly(A) RNA m each 
lane were prehybridized for 1 h at 68"C, then hybridized at 68*C with 
a 1.5 kb [a-^-PJdCTP-Iabeled probe that represented the coding region 
of ihe mouse epitheliasin cDNA. After low stringency washes, the 
blots were washed at high stringency at 50'C and auioradiographed. 

2.6. Production of antibodies asainst epitheliasin 

Rabbit polyclonal antiserum was raised to a synthetic peptide. 
cS--"HPNYDSKTKNND'^-\ located in the serine proteinase region 
of epitheliasin. The peptide was chosen based on predicted surface 
hydrophiliciiy and antigenicity. The peptide was coupled to keyholc- 
limpei hemocyanin. Subcutaneous injections were given to rabbits 
with 100 pg of conjugate thai was emulsified in Freund's complete 
adjuvant and then boosted with the same amount of antigen in 
Freund s incomplete adjuvant at 2 week iniervals until a tiler of 
> 1:4000 was obtained. The presence of anti-peptidc anlibodies was 
assessed by dot bloi analysis using the peptide linked to ovalbumin as 
the antigen. 

2. 7. Imnuoiohistofogy 

Mouse kidneys and lungs were fixed in buffered IO^/«. formaldehyde, 
and embedded in paraffin. Sections were cut at 5 pm depths, dcparafli- 
nized and rehydraied. Following antigen retrieval performed with I x 
Citra solution in a microwave oven for 15 min at 700-900 W. the 
samples were washed in PBS. Endogenous peroxidase aciiviiy was 
blocked with 2aK» mcihunol and ?% H.O: in PBS for 30 min at 
room temperature. The tissue was permeated using 10^/" Triton X- 
100 in PBS for 20 min at room temperature. Endogenous bioiin 
was blocked by Vecior Block tividin solution for 30 min at room 
temperature followed by Vector Blocking solution for 30 min al 
room temperature. The sections were then incubated with epitheliasin 
peptide ;inti-serum. dilution 1/500 in Block solution overnight at 4'C 
in a humid chamber. After washing with TNT. 1/500 horse anti-rabbit 
IgG serum in TNT was applied for 30 min at room temperature. The 
slides were then incubated with l/lOO SA-HRP in TNT for 30 min ai 
room temperature. The signal was amplified with bioiinyl tyramide 
for 5 min at room temperature. This was followed by a re-incubalion 
with l/lOO SA-HRP in TNT. The signal was visualized using VIP 
substrate solution. The same process was applied to the slides used 
as controls, but epitheliasin anti-scrum was replaced by non-immune 
rabbit serum. 



3. Results and discussion 

3.1. Cloning and analysis of the epitheliasin full-letigth cDNA 
Fig. J shows the nucleic acid and deduced amino acid se- 
quences of the complete cDNA rcconstiiuied from the RACE 
fragments. As demonstrated by the immunohistochcmisiry de- 
scribed in a following section, the encoded protein is highly 
expressed in epithelial tissue. Accordingly, we named ihe pro- 
tein epitheliasin. The composite cDNA spans 1753 ni. A 5' 
untranslated region (UTR) extends 100 nt. The first in-frame 
ATG (1-3 nt) was assigned as the codon for the Mel trans- 
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lation initiator since the sequence around this codoh (Xj 
GATGG) conforms to the Kozak consensus sequence If,^ 
mammalian protein biosynthesis 15J. A single open readiS 
frame begins with the ATG and extends 1470 nt. Thisj 
followed by a stop codon, TAA (1471-1473 nt) and a y 
UTR of 152 nt, terminating in a poly(A)+iaiI of 28 nl^ 
consensus polyadenylation site (ATT AAA, 1600-1605 nt)".l 
located 20 nt upstream of the poly (A)+lail. """^ 



CO 



3.2. Characteristics of the sequence and structural features o 
epitheliasin 



\'.: 



The open reading frame encodes a protein of 490 ami^^M7^ 
acids with a calculated molecular mass of 53 529 kDa. Con^M^^ § 
parisons with sequences in GenBank. EMBL and SWISS-'I'^Ij^.. q 
PROT reveal that the epitheliasin 

main serine proteinase. A typical amino-lerminal signal sc^^V 
quence is not present, but a hydrophobic region is prcscnrjj. 
near the amino terminus (Leu** to Trp'°*). This 22 amin()\ 
acid region is flanked by charged amino acids (Lys artd'l 
Arg) and corresponds to a transmembrane domain [6]. Based 
on the difference in total charge between the 15-residue sc. -: 
quences on either side of the membrane-spanning domain - 
epitheliasin can be classified as a type II integral membrane-: . 
bound protein [7,8] that has a cytosol facing amino-tcrminal 
tail region consisting of 83 amino acids (Met* to Ser*^) and aii . 
extracellular facing COOH-terniinal modular region. The ab- 
sence of a signal peptide and the presence of a transmembrane 
domain in epitheliasin are analogous to homologous serine, 
proteinases, enieropeptidase. a key enzyme in digestion thai' 
is responsible for the conversion of trypsinogen to trypsin [9], . 
hepsin, a membrane-associated proteinase involved in the (oi- ■'. 
maiion of thrombin on cell surfaces [10]. and a recently dc| 
scribed human airway trypsin-likc proteinase (II}. 

The predicted domain structure of epitheliasin is shown in" 
Fig. 2. A LDLRA domain extending from Cys"- lo Cys'*? 
and containing six cysteines follows the transmembrane dch 
main. This domain motif is found in a number of proteins 
that arc functionally unrelated to the UDLR family, including 
clotting proteinases and enlcropcplidasc. In each of these pro- . 
tcins the doniain is thought to function as a protein-binding 
domitin. The LDLRA domain in cpiihcliiisin is similar to 
other typical LDLRA domains that arc about 40 amino acids , 
long arid contain six cysteines (12). The cysteines form intra-'' 
doinain bridges resulting in a cluster of negatively chargtti 
residues in a single loop positioned for high affinity binding 
to positively charged sequences in LDLR ligands. 

Following the LDLRA domain, an SRCR-like domain exv 
tends from val"*^ to Gly-'*\ SRCR domains are classified into 
two groups, group A and B according to the number of con*^ 
served cysteine residues, six or eight, respectively [131. 
recent analysis, all but one of the 33 independent SRCR do-.: 
mains that had been previously identified had six or cigM^ 
cysteines [14]. An unusual feature of this domain in epithelig^ 



sin is that it contains only four cysteines. These cysteine 
idues in epitheliasin are completely conserved in positio^ 
suggesting that the domain belongs to group A. The SR^^^ 
domain that is closest to that in epitheliasin is prescnt^^ ^ 
complement factor 1 (CFl), a serum proteinase that regula^ 
the complement cascade by cleaving C3b and C4b. CFl jog^^-. 
tains a single SRCR domain with five cysteines (13]. 

The function of SRCR domains is largely unknown 
seems likely that most of these domains are involved in bj 
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ing lo molecules on ihe cell surface or in the extracellular 
space. Direct evidence supporting the idea that SRCR do- 
mains mediate binding to other cell surface proteins or extra- 
cellular proteins has recently been provided [14.15]. 

3.3. Fearures of serine proteinase domain 

The proteinase domain begins with lle^** and represents the 
major domain (about 50%) of the encoded protein. The pre- 
dicted molecular mass of the domain is 25 892 kDa. The do- 
main contains all the major features conrmion to the SI family 
of the chymotrypsin (or SA) clan of serine proteinases. The 

Tabic 1 

Exon-iniron junctions organizaiion of cpilheliasin gene 
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residues contributing to the salient structural features in c^ 
moirypsin include: (!) His*', Asp'"-, and Ser'" that makc'^i 
the catalytic triad, (2) Gly*", Asp'^ and Ser»" that fonn;^ 
oxyanion hole required for' catalytic efficiency, (3) Scr^Mf 
Trp^'* and Gly^'* that bind the main-chain of a subsiraic^r' 
and (4) residues that occupy the bottom (Ser'*') and sia^ 
(Gly^*^ and Gly^*) of the substrate specificity pocket (S| 5ub! 
site). All of the residues contributing to the first three featur^ 
and the residues Gly^*^ and Gly^* on the sides of the sul?^ 
siratc specificity pocket of chymotrypsin are strictly conservS^g^^- : 
in epitheliasin. However, in epitheliasin the residue corre%^i Mfig- 
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--L^c^. 2 The domain organization of epilhdiasin. Starting at the NH,-terminus the epilhcliasin contains a TM domain followed « LDLRA 
rre-^f,^^Fi^ ^^.-^ ^ gj^^j^ domain, and Bnally the serine proteinase domain. A'-glycosylaiion sues are mdicated by a circle. The numbers m parentheses 
Epjjr to the amino acid residues of each domain. 



spending to Ser*'' of chymoirypsin is replaced by an acidic 
irrtidue. Asp, This suggests that epitheliasin has specificity for 
k'^clcavage after Lys or Arg, indicating a trypsin-likc substrate 
[Mspccificity for the enzyme. 
!. Comparison of the amino acid sequence encoding the pro- 
teinase domain in epitheliasin with other serine proteinases 
I ^indicates that this region of the protein shares identity with 
Jmouse enteropeptidasc (53%). hepsin (51%), acrosin (48%), 
^snd neurotrypsin (46%), all multi-domain members of the 
V'chymotrypsin family of serine proteinases with trypsin-like 
c-i$ubstratc specificity. The aforementioned CAP I from Xeno- 

■ pits laevfs kidney epithelial cells has a sequence identity with 
;* epitheliasin of 44%. 

Based on findings with related vertebrate trypsinogens we 
r predict that epitheliasin is synthesized as an inactive zymogen 
;- that is converted to an active serine proteinase by cleavage of 
• the Arg-'^'-lle--^ peptide bond in the extracellular domain of 
l.thc enzyme. Most vertebrate trypsinogens arc activated by 
"proteolytic cleavage of a Lys (Arg)-Ile bond. The identity 
-or the origin of the proteinase responsible for this cleavage 
in epitheliasin is not known. One possibility is that epiihelta- 
.sin is synthesized as a single-chain zymogen and undergoes 
intracellular cleavage and activation by a furin-Iike enzyme 
prior to insertion into the membrane. This is based on the 
Arg-Gln-Ser-Arc-*-* sequence that immediately precedes the 
. Ilc-Val-Gly-Gly-''' representing the NH^-termlnus of the pro- 
teinase domain. Arg-X-X-Arg motifs are furin recognition 
-sequences [16-20], Interestingly, all the domains of epiihelia- 
.-lin are flunked by recognition sites for furin-Iike enzymes. 

■ suggesting the need to clarify the role of furin-Iike enzymes 
-in processing of epitheliasin. 

Based on the structure of enteropeptidasc and a comparison 
.with other chymotrypsin-like serine proteinases, we also pre- 
'dict that epitheliasin. following intracellular cleavage, forms 
-two chains with the smaller chain containing the proteinase 
" domain, and the larger the membrane-spanning segmieni. and 



the LDLRA and SRCR-like domains that may serve as sub- 
strate recognition sites. Several chymotrypsin-like serine pro- 
teinases including enteropeptidasc -have a disulfide bond that 
covalently links the two chains [21). The proteinase domain in 
epitheliasin contains eight Cys residues in conserved positions. 
By comparison with chymoirypsin, three of the Cys pairs (42/ 
58, 168/182 and 191/220) that form disulfide bond loops 
around His", Met'*'*' and Ser"^ are conserved in epitheliasin. 
Although the other two cysteines (Cys*-- and Cys*^) are lo- 
cated in conserved positions, their pairing counterparts Cys' 
and Cys-*" that are involved in interchain disulfide bonds are 
absent. This suggests that epitheliasin is likely distinct from 
enteropeptidasc and other muliidomain serine proteinases in 
that it lacks disulfide bond(s) between the proteinase motif 
and the rest of the protein [22). Thus, the mechanism of asso- 
ciation of the two chains in epitheliasin is not clear. 

Three asparaginc-linked glycosylation sites are present in 
epitheliasin, Asn'" located at the beginning of the LDLRA 
domain of the protein.. Asn-'* located in the SRCR domain 
and Asn"*'"* located in the proteinase domain (see Fig. 1). 
Other features of the deduced primary structure of the protein 
include a cAMP- or cGMP-dependent protein kinase phos- 
phorylation site (Lys--**'-Ser-*-). Two protein kinase C phos- 
phorylation sites are present in the cytoplasmic domain 
(Thr"-Lvs'^ and Thr*'^-Lvs''- ). three in the SRCR domain 
(Ser'"-Arg'*^. Ser-''-Arc--*\ Ser"''-Arg--'''). one between 
the SRCR domain and the proteinase domain (Scr- 
Lys-"''^). and one in the proteinase domain (Thr"-*-Lys"*^' ). 
Three casein kinase 11 phosphorylation sites are present, two 
in the LDLRA domain (Ser"'-Glu"^. Ser"''-Glu"'). and the 
last one in the proteinase domain (Ser-'^'-Asp-*^). Finally, an 
ATP/GTP-binding site motif A is present in the proteinase 
domain of epitheliasin. from lie"' to Ala-^'*^. This motif is 
found in a number of proteins including those in the myosin 
and Rhs families. The relevance of these various sites in epi- 
theliasin is not presently known. 
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gfis. 3. Schematic representation of the genomic organization of epitheliasin. The intron placements are depicted in relationship to the domains 
5^Cthe mouse epitheliasin protein. The numbering represents nucleotides. 
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3.4. Genomic organization 

The cpilhcliasin gene contains 14 exons separated by 13 
inlrons (Fig. 3). The first exon is located in the 5' untranslated 
region. The last exon contains 9 bp of the coding sequence, 
the stop codon and the 3' untranslated region. The exon dis- 
tribution reflects the organization of the deduced protein. 
Exon 2 and 3, respectively 68 and 220 nt (M*-S"). encode 
for the cytoplasmic domain. Exon 4, 87 nl, (K^-F'°') encodes 
for the transmembrane domain. Exon 5, 117 nt, (D^^^'-R'* ) 
encodes for the LDLR domain (C'^-C'*'). An unusual fea- 
ture of epitheliasin is that the SRCR domain is encoded by 
three exons. 6-8, respectively 130 nt, 11 1 and 44 nt (C**^-I-^*). 
Usually SRCR domains are encoded by one or two exons, in 
regard to type B or type A, respectively. Exons 9-13, respec- 
tively 169, 176, 96. 143 and 153 nt, (E^^'-R*^) encode for the 
serine protease domain. Vertebrate serine protease-like genes 
have been grouped into five classes based on intron positions 
(23]. The gene organization of the epitheliasin protease do- 
main is typical of second group containing members of the 
trypsin family of serine proteases and consisting of five exons 
with each of the three components of the catalytic triad en- 
coded by sequences in a different exon. In epitheliasin, the 
catalytic histidine is located in exon 9, the aspartic in exon 
10 and the serine in exon 13. In general, the organization of 
epitheliasin is similar to that of other multiple domain serine 
proteinases. Each domain is coded in an independent manner 
by one or more exons. A common feature among all multi- 
domain protease cloned to dale is the five exons coding for 
the serine proteinase domain [24]. 

As shown in Table 1, all intron/exon junctions contain the 
expected GT splice donor and AG splice acceptor sites and 
conform to the consensus sequences established for intronic 
donor and acceptor splice signals [25]. Four introns are in- 
serted between codons (type 0 splice junction), five are after 
the first nucleotide in a codon (type I splice junction), and 
four after the second nucleotide codon (type II splice junc- 
tion), six bands were strongly positive by Southern analysis 
with sizes of 7000, 5000, 2700, 1400, 1200 and 900 nt. Adding 
the size of the fragments indicates that the epitheliasin gene is 
approximately 18 kb. 
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Fig. 4. In situ hybridization of a bioiin-labcled epitheliasin probe to 
mouse meiaphasc cells. The chromosome 16 homologucs arc identi- 
fied with arrows. Specific labeling was observed at chromosome 
band I6C2. 
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Fig. 5. Northern blot analysis of epitheliasin mRNA in various 
mouse tissues. Each lane contained 2 pg of poly(A)+RNA. The bloi 
was hybridized to an epitheliasin cDNA probe. 
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3.3. Chroniosoma! assignment 

FISH was p)erformed on normal mouse chromosomes using 
a BAC containing the genomic sequence of epitheliasin (Fig. 
4). These studies localized the epitheliasin gene to the tclomcr- 
ic region in the long arm of chromosome 16. The band local- 
ization was confirmed on G-banded chromosomes. The hy« 
bridization efficiency was 92.5%. No other serine proteinases 
have been localized to this region. The region is homologous 
with the so-called 'Down's syndrome region* of human chro- 
mosome region 2lq22.2 and 2lq22.3. 

3.6. E.xpression of epitheliasin tuRNA in vivo 

The in vivo distribution of epitheliasin mRNA was inves- 
tigated in adult mouse tissues by Northern blot analysis. As 
shown in Fig. 5, a prominent 2.8 kb transcript and a less 
prominent 1.5 kb transcript were observed in the kidney. Be- 
cause of preliminary results that suggest an alternative poly- 
adenylation site approximately 1.3 kb downstream from the 
initial polyadcnylation site, we believe that the weaker signal 
actually represents the characterized cDNA. A prominent 2.8 
kb signal was also seen in the lung and a weaker signal of 
similar size was observed in liver tissue. No signal was ob* - 
served in heart, brain, spleen, testis or skeletal muscle. Of • 
note, all tissues that express epitheliasin have epithelial cells 
as a prominent feature of their cellular makeup. 

3. 7. Immunohistocliemical locatization 

Fig. 6A shows the kidney in which only tubular epithelial," 
cells are stained with no staining of glomeruli. The staining 
restricted to cells located in distal tubules. The staining 
most intense at the apical pole of the cells, facing the lumenr 
of the tubules. The staining is faint in the cytoplasm, basaU 
and lateral side of the cells. Fig. 6B shows the lung in whi^ 
staining is primarily limited to the apical surface of airway^ 
epithelial celts. Staining is minimal or absent in the vascu^ 
ture and alveolar spaces. No staining was observed in contrw^ 
slides. Further analysis by in situ hybridization using a ^^^^Sg^ 
epitheliasin riboprobe demonstrated that the pattern of 
expression was the same as that of protein expression (daj* 
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'Fig. 6. Immunohisiochcmicul localizaiion of epiiheliasin in aduli mouse tissue. A: A seciion from the kidney (magnification 20 x ). Posuiyc 
.iiaining is seen in apical region of renal distal lubulc epithelial cells. B: A section from lung (magnification 20x). Positive staining is seen m 
-bronchial epithelial cells. No stain was obser\'ed in control sections in which normal rabbit scrum substituted for rabbit anti-mouse cpiiheliastn 
r(daia not shown). 



>ooi shown). These results support the epithelial and mem- 
^^•branc localization of epitheliasin. 

During the course of this iiivcstigation Paolini-Giacobino 
|iarid colleagues reported on a human cDNA cloned by exon 
^trapping named TMPRSS2 [3]. The portion of the TMPRSS2 
HcDNa that was reported has approximately 80% sequence 
g**cntity to epitheliasin. However, the tissue distribution of 
Hheliasin and TMPRSS2 is strikingly different. While epi- 
eliasin is highly expressed in the mouse kidney, no expres- 
on of TMPRSS2 was observed in the human kidney. In 
"*rast, no expression of epitheliasin was observed tn the 
^use heart or brain, while a high level of expression of 



TMPRSS2 was observed in human heart and an intermediate 
level in brain. Moreover, the size of epitheliasin of the mRNA 
transcript (2.8 kb) and thai of TMPRSS2 (3.8 kb) are differ- 
ent. Whether TMPRSS2 is the human orlhologue of epithe- 
liasin or a closely related gene product will require further 
study. 

The biological role of epitheliasin is not known. The ho- 
mology with CAPI and apical membrane distribution raise 
the possibility that epitheliasin may activate ion transport 
channels of the plasma membrane. In addition, cell-surface 
proteinases of normal and malignant cells are thought to 
play roles in cell growth, chemotaxis« cndocytosis. exocytosis, 
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blood coagulation, fibrinolysis and tissue invasion during 
metastasis [26]. While the function of the non-proteinase do- 
mains is unexplored, the presence of these domains with a 
modular organization represents a common feature of regu- 
latory serine proteinases (e.g. proteinases of the fibrinolytic 
and blood coagulation systems). Studies of the kinetic effects 
of deleting the non-proteinase domain from enteropeptidase 
clearly implicate it in the recognition of macromolecular sub- 
strates and inhibitors [21]. 
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Biochemistry 

Enterokinase, the initiator of intestinal digestion, is a mosaic 
protease composed of a distinctive assortment of domains 

(scfliBC protcsses/trypslDOgcn acCtratkui) 
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ABSTRACT Enterokiiiase is a protease of the intestiiial 
brush border that specifically cleaves the acidic propeptide 
fhHD trn»inogen to yield active trypsin. This cleavage initiates 
a cascade of proteolytic reactions leading to the activation of 
many pancreatic zymogens. Tl>e ftiH-loigth cDNA sequence for 
bovine enterokinase and partial cDNA sequence for human 
enteroUnase were determined. The deduced amino add se- 
quences Indicate that active two^cfaain enterokinase Is derived 
from a single-chain prec ur sor. Membrane association may be 
mediated by a potential signal-anchor sequence near the amino 
terminus. The amino terminus of bovine raterokinase also 
meets the known sequence requirements for protdn N-myrls- 
toylation. The amlno-terminal heavy chain contains domains 
that are homologous to segments of Uie low density lipoprotein 
receptor, complement componoits Clr and Cls, the macro- 
ph^ scavenger receptor, and a recently described motif 
shared by the metalloprotease meprin and the Xenopm AS 
neuronal recognition protein. The carboxyl-termlnal light 
chain is homologous to the trypsln-Uke serine proteases. Thus, 
enterokinase ts a mosaic protein with a complex evolutkmary 
history. The amino add sequom surrounding the amino 
temdnus of the enterokhiase light chafai Is ITPK-IVGG (hu- 
man) or VSPK-IVGG (bovine), suggestfaig that slngle<hafai 
enterokinase Is activated by an unidentified trypsin-like pro- 
tease that deaves the indicated Lys-De bond. Thcrrfore, en- 
teroidnase may not be the ^'flrst" oizyme of the intestinal 
digestive hydrolase cascade. The specificity of entmklnase for 
the DDDDK-I sequme <rf trypsbiogen may be explained by 
complementary basic-aniino add residues clustered In poten- 
tial S2-S5 snbsltes. 



All animals need to digest exogenous macromolecules with- 
out destroying similar endogenous constituents. The regula- 
tion of digestive enzymes is, therefore, a fundamental re- 
quirement (1). Vertebrates have solved this problem, in part, 
by using a two-step enzymatic cascade to convert pancreatic 
zymogens to active enzymes in the lumen of the gut. The 
basic features of this cascade were described in 1899 by N. P. 
Schepovalnikov, worldng in the laboratory of I. P. Pavlov 
(2). Extracts of the proximal small intestine were shown to 
strikingly activate the latent hydrolytic enzymes in pancre- 
atic fluid. Pavlov considered this intestinal factor to be an 
enzyme that activated other enzymes, or a "ferment of 
ferments," and named it "enterokinase." The importance of 
this protease cascade is emphasized by the life-threatening 
intestinal malabsorption that accompanies congenital defi- 
ciency of enterokinase (3, 4). 

Enterokinase activates bovine trypsinogen by cleaving 
after the sequence VDDDDK, releasing an amino-terminal 
activation peptide (5,6). The acidic DDDDK sequence of the 
trypsinogen-activation peptide is conserved among verte- 
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brates (7), except for the similar sequences of trypsinogens 
from lungfish (lEEDK and LEDDK) and Afiican clawed frog 
(FDDDK). Enterokinase prefers substrates with the se- 
quence DDDDK, whereas the presence of aspartate residues 
markedly inhibits the ability of trypsin to cleave such sub- 
strates (8). For example, toward bovine trypsinogen the 
catalytic efficiency of enterokinase is 12,000-fold (porcine) 
(9) or 34,000-fold (bovine) (10) greater than that of bovine 
trypsin. This reciprocal specificity protects trypsinogen 
against autoactivation by trypsin and promotes activation by 
enterokinase in the gut. 

Enterokinase has been purified from porcine (11), bovine 
(10, 12, 13), human (14), and ostrich intestine (15). With the 
possible exception of human enterokinase, wliich was sug- 
gested to be a heterotrimer (14), enterokinase ^pears to be 
a disulfide-linked heterodimer with a heavy chain of 82-140 
kDa and a Ught chain of 35-62 kDa. Mammalian enteroki- 
nases contain 30-50% cart>ohydrate, which may contribute to 
the cq>parent differences in polypeptide masses. The heavy 
chain is postulated to mediate association with the intestinal 
brush border membrane (16), although no direct evidence for 
this function has been reported. The light chain contains the 
catalytic center. Based on susceptibility to inhibition by 
chemical modification of the active-site serine and histidine 
residues (9-11, 17) and on the partial amino acid sequence 
(18) and cDNA sequence of the bovine enterokinase light 
chain (19), enterokinase is a member of the trypsin-like fiamily 
of serine proteases. 

Enterokinase stands at or near the top of a regulatory 
enzyme cascade that successfully limits the activity of diges- 
tive hydrolases to the gut, but there is no structural expla- 
nation for enterokinase membrane localization, substrate 
specificity, or expression specifically in the proximal small 
intestine. To address these questions we have characterized 
cDNA clones for bovine and human enterokinase.' 

MATERIALS AND METHODS 

Materials. Purified calf enterokinase (EK-3, 131 unito/fig) 
was from Biozyme Laboratories (San Diego). Fresh bovine 
tissues were from a local abattoir. 

Amino Add Sequencing. Enterokinase (16 /ig) was reduced 
with 0.5% (vol/ vol) 2-mercaptoethanol, separated by elec- 
trophoresis (20), transferred to an Immobilon P membrane 
(Millipore) by electroblotting, and stained with (3oomassie 
brilliant blue. The excised light-chain band («47 kDa) was 
subjected to automated Edman degradation with an AppUed 
Biosystems model 470A sequencer (21) equipped with a 
model 120A phenylthiohydantoin analyzer. 
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GenBank data base (accession nos. U09859 and U09860). 



7588 



Biochemistry: Kitamoto et al 



Proc. Natl Acad, ScL USA 91 (1994) 7589 



Isolatioa of cDNA Clones. RNA was extracted (22) firom 
bovine duodenum and proximal small intestine. Single- 
stranded cDNA was prepared from total RNA (10 ftg) using 
avian myeloblastosis virus reverse transcriptase and an oli- 
go(dT) primer (cDNA cycle kit, Invitrogen). The cDNA was 
used for PCR amplification (30 cycles of 2-min annealing at 
59>*C, 2-min extension at IT'C, and l-min denaturation at 
with sense primer 5'-TAY GAR GGI GCI TGG CCI 
TCG GT-3' and antisense primer 5'-AAT GGG ACC CKXT 
IGA RTC ICC-3'. Products were analyzed by Southern 
blotting and hybridization with ^^-labeled oligonucleotide 
probe 5'-Sn WCI GCI GCC CAC TG-3' . The positive 572-bp 
product was cloned to yield pBEKl. 

Additional clones were ob taine d by radiolabeling the cDNA 
insert of pBEKl with P^JdCTP (23) and screening of bovine 
or human small intestine Agtll cDNA libraries (Qontech) or 
by using oligonucleotides to screen 5' rapid amplification of 
cDNA ends (RACE) libraries (24). RACE hl>raries were 
constructed with the 5' RACE system (GIBCO/BRL) using 
bovine intestinal RNA and one of two sets of enterokinase- 
specific primers: set 1, 5'-TTA TTG TCTF TCA TCA GAG 
CCA TC-3' . 5'-TGG ACA GTT TAA TTC TCC ATC ACA-3' . 
5'-ATC AAT TGC TAT GTA CTT TAG AGC-3'; set 2. 
5'-ATT GAG ACA TTT CCT GTG ATA TCA ATG CrrG-3'. 
5 -TGT GGA AAG TGA CCA GTT GGC TGG ATT TAT-3'. 
5'-GCC TTG AAT CAG TTC TTC TT-3'. DNA sequences 
were determined on both strands (25). 

DNA Sequence Analysis. Sequences were compared to 
GenBank and EMBL data bases at the National Center for 
Biotechnology Information using the BLAST network server 
(26). Sequence alignments and consensus sequences were 
prepared and analyzed with the programs pileih* and gap of 
the Genetics Computer Group (version 7.1, Madison, WI). 
The significance of gap alignments was evaluated by com- 
paring the optimal alignment score (x) to the mean (^i) and SD 
(a) of scores obtained for 30 alignments of randomized 
sequences, using the normal distribution to estimate the 
probability that the alignment could occur by chance. 

RESULTS AND DISCUSSION 

Isolation of cDNA Clones. The bovine enterokinase light 
chain was reported to contain the motif XEGAWPW^V at 
residues 8-16 (18); the underlined residues are not conserved 
in other serine proteases. Thirty-one residues of the amino- 
terminal sequence of the bovine enterokinase light chain were 
determined, and the previously reported sequence was con- 
firmed, except that arginine rather than tyrosine was identi- 
fied at cycle 8. This sequence was used to design a degenerate 
23-mer ** sense** primer that would be relatively specific for 
enterokinase. A degenerate 21-mer * 'antisense** primer was 
based on the conserved GDSGGPL motif that contains the 
active-site serine of serine proteases. Upon PCR with a 
bovine small intestine single-stranded cDNA template, the 
m^or product hybridized to a probe based on the conserved 
sequence near the active-site histidine. The corresponding 
clone pBEKl was used to isolate overlapping cDNAs from 
bovine and human small intestine cDNA libraries. 

The composite cDNA sequence for bovine enterokinase 
spans 3923 nt. Beginning at nt 113 there is an ATG codon and 
open reading fr'ame of 3105 nt, a stop codon plus 3' untrans- 
lated region of 643 nt, and a poly(A) tail of 63 nt. A poly- 
adenylylation signal of AATAAA is present 25 nt before the 
poly(A) tail. The open reading frame encodes a polypeptide of 
1035 amino acids with a calculated mass of 114.9 kDa. The 
translated amino acid sequence after residue 800 (Fig. 1) was 
identical to the 31 residues determined by Edman degradation 
of the enterokinase light chain, confirming that the cDNA 
encodes enterokinase. A segment of 81 nt ttmt encodes amino 
acid residues Ala-166-Pro-192 was present in three cDNA 



clones but absent in one (Fig. 1). This sequence is not 
delimited by splice sites and therefore may be encoded by an 
exon that is occasionally absent due to alternative splicing. 
This segment also could represent a length polymorphism. 

The partial cDNA sequence for human enterokinase cor- 
responds to amino acids 765-1035 encoded by the bovine 
sequence. In the region of overlap, the open reading frames 
of the bovine and bimian nucleotide sequences are '»85% 
identical, and the encoded amino acid sequences are «'84% 
identical. The 3' untranslated regions are less conserved, 
exhibiting «*67% sequence identity over 572 nt. 

By Northern blotting, an enterokinase mRNA species of 
»4.4 kb was detected in human small intestine, but not in 
leukocytes, colon, ovary, testis, prostate, thymus, spleen, 
pancreas, kidney, skeletal muscle, liver, lung, placenta, 
brain, or heart (data not shown). This result is consistent with 
the studies of Pavlov on the distribution of enterokinase (2) 
and the immunohistochemical localization of enterokinase in 
the brush border of duodenum and jejimum (27). 

Structore of the Enterokinase Catalytic Domain. In agree- 
ment with LaVallie et al (19), amino acid residues 801-1035 
correspond to the enterokinase light chain, which has a 
predicted mass of 26.3 kDa, compared with 47 kDa observed 
for purified bovine intestinal enterokinase (data not shown). 
The difference reflects glycosylation of the light chain. There 
are three and four potential N-linked glycosylation sites, 
respectively, in the bovine and human enterokinase light 
chains, and digestion of bovine enterokinase with peptide:N- 
glycosidase F reduces the apparent mass of the light chain 
fi^m 47 kDa to 35 kDa (data not shown). 

The enterokinase protease domain was compared with 
other serine proteases for characteristic disulfide bond pat- 
terns and sequence similarity. Enterokinase is most similar to 
a subfamily of two-chain serine proteases that share 10 
conserved cysteine residues and in which the activation 
peptide remains attached to the protease domain by a disul- 
fide bond. The archetype of this group is chymotrypsin. By 
analogy to chymotrypsin (28, 29) and related proteases for 
which the disulfide bonds have been determined directly, the 
most likely pairings in enterokinase are as follows: Cys-788- 
Cys-912. Cys-826-Cys-842. Cys-926-Cys-993. Cys-957-Cys- 
972, and Cys-983-Cys-1011. The first of these disulfide bonds 
joins the heavy chain and light chain. 

The amino acid sequence of the enterokinase protease 
domain is strikingly similar to the blood coagulation prote- 
ases factor XI (30) and prekaUikrein (31) and to hepsin, an 
unusual serine protease with a possible transmembrane do- 
main near the amino terminus (32). Enterokinase exhibits the 
expected conservation of serine protease sequence motifs; in 
particular, the active-site residues can be identified as His- 
841, Asp-892, and Ser-987 (Fig. 1). Compared with factor XI, 
hepsin, and chymotrypsin, the human enterokinase light 
chain has 41%, 44%, and 35% identical amino acid residues. 
The percentages for the bovine enterokinase comparisons are 
similar. Enterokinase and factor XI appear to share two 
potential N-linked glycosylation sites, whereas hepsin has no 
N-linked glycosylation sites. 

The specificity of enterokinase for cleavage after lysine is 
consistent with the presence of Asp-981 at the base, and 
Gly-1008 and Gly-1018 at the sides of the specificity pocket 
or SI subsite that binds the substrate PI residue (Fig. 1). The 
requirement for aspartate in the P2-P5 positions suggests that 
the surface of enterokinase should provide electrostatic com- 
plementarity to negatively charged side chains. Examination 
of the homologous three-dimensional structure of chymo- 
trypsin suggests that several exposed surface loops of enter- 
okinase (Fig. 1, segments a-d) might contact these substrate 
residues. Within these segments, there are a few positively 
chaiiged residues that are present in both bovine and human 
enterokinase but absent fr^om related proteases with different 
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specificity for the P2-P5 substrate residues. In particular, the 
RRRK (human) or KRRK (bovine) sequences between res- 
idues 886-889 (Fig. 1, segment b) may interact directly with 
the aspartate residues in enterokinase substrates. 

The synthesis of enterokinase as a single-chain protein 
poses a conceptual problem because it indicates that **pn>en- 
terokinase** itself must be activated by proteolytic cleavage. 
The responsible protease could act on proenteroldnase in- 
tracellularly during biosynthesis or extracellulaily. Although 
the reaction could be autocatalytic, the participation of a 
separate protease seems more likely. In that case, enteroki- 
nase would not be strictly at the top of the digestive hydrolase 
cascade but would be in the second position at best. The 
amino-terminal isoleucine of the enterokinase light chain is 
preceded by Scr-Pro-Lys (bovine) or Thr-Pro-Lys (human), 
suggesting that enterokinase is activated by a trypsin-like 
enzyme. The identity and location of the proenterokinase 
activator may indicate another level in the control of diges- 
tion. 

Strnctnral Motifo of the Enterokinase Heavy Chain. The 
nucleotide sequence around the codon for Met-1 is 
AA AATGG . and that for Met-20 is GTCAIQT. Only the 
former sequence matches at both positions -3 and +4 the 
consensus sequence proposed for translation initiation in 
vertebrate mRNAs (33), suggesting that initiation at Met-1 is 



more likely. There is no ii>-frame termination codon within 
the available 112 nt of putative 5' untranslated sequence, so 
it is possible that the initiation codon remains to be cloned. 
However, initiation at Met-1 predicts a bovine enterokinase 
heavy chain of 800 amino acids with a mass of 88.6 kDa (Fig. 
1), and this is consistent with the ■=«763 amino acids and «'84 
kDa estimated by compositional analysis of purified enter- 
okinase (12). By SI>S/gel electrophoresis, the apparent mass 
of the heavy chain was *«'116 kDa, decreasing to "'82 kDa 
after removal of N-linked oligosaccharides with peptide:N- 
glycosidase F (data not shown). This decrease in mass is 
consistent with the reported carbohydrate composition of 
enterokinase (10, 12), and there are 17 potential N-linked 
glycosylation sites in the sequence of the heavy chain (two 
are concatenated) (Fig. 1). 

The hydrophobic 29-residue sequence from Val-19 through 
Val-47 could serve as a signal peptide. If it were not cleaved 
by signal peptidase, this segment could function as a signal- 
anchor sequence and account for the membrane association 
of enterokinase. The amino-terminal sequence also is com- 
patible with the substrate specificity of myristoylCoA:pro- 
tein N-myristoyltransferase (34), suggesting that Gly-2 may 
be myristoylated and thereby provide another mechanism for 
membrane targeting during biosynthesis. 

The heavy chain of enterokinase contains five domains that 
are related to four different structural motifs found in other 
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BKbov PDSRLCADAL KYIAIDLFCD GELNCPDGSD ECaBICATAC DGRFLLTGSS GSFEALHYPK PSBHISAVCR WIIRVNQGLS IQLNFDYPWT YYADVLNIYE 300 



EKbov GMGSSKILRA SLWSNNPGII RIFSNQVTAT FLIQSDBSDY IGFKVTYTAF NSKELNNYBK INCNFEDGFC FWIQDLNP DN EWERTQG3TF PPSTGPTFDH 

3 — ' " I ■ .. 

EKbov TPGBMGFYI STPTGPGGRR ERVGLLTLPL DPTPEQACLS FWyVMYGENV YKLSIBIBSD Q MMEKTIFQK EGNYQQNWNY GQVTiairVE PKV3FYGFKM 

EKbov QILSDIALDD TgLTYCI CBV S VYPEPTLVP TPPPELPTDC GGPHDLWEPB-ITFTSINFPN SYPNQAFCIW NLNAQKGKNI QLHFQSPDLE NIADWEIRD 

^ 3 • ' — — ' 

EKbov GECDDSLFIA VYTGPGPVND VFSTTNRKTV LPITDNMLAK QGFKAHnTG YGLGIPEPCK EDNFQCKDGE CIPLVNLCDG PPHCKDGS DE AHCVRLFBHT 

4 — ■ » 

LILTPS QQCLQDSLIR LQCNHKSCGK KLA. ,AQDIT 

TDSSGLVQFR IQSIWHVACA EWIIT ToisDD VCQLLGLGTG HfiBVPTPSTG GGPYVNLNTA PHMLILTPS QQCLEDSLIL LQCMYKSCGK KLV. .TQEVS 

CTT KIR. ...... 



EKhu 
BKbov 
FXI 
Heps in 
Chta 
Consensus 



EKhu 
EKbov 
FXI 
Heps in 
Chta 
Consensus 



EKhu 
EKbov 
FXI 
Hepain 
Chta 
Consensus 



400 
500 
600 
700 

798 



CGR RKL PV 

COY PAIQPVLSGL 

CG- K 



PKIVGGSNAK 
PKIVGGSDSR 
PRIVGGTASV 
DRIVOGRDTS 
SRIVNGEBAV 
PRIVGG-D-- 
A 

DHDIAMMHLE 
NNDIAHKHLE 
GYDIALLXLE 
SNDIALVHLS 
NHOITLLKLS 
-NDIAL-HLE 



BOAWPWWGL 
EQAWPWWAL 
RGEWFWC2VTL 
LORWPWQVSL 
PGSIWPHQVSL 
-O-WPWQV-L 



FKVaXXDYIQ 
MKVHUDYIQ 
TTVMXJDSQR 

SPLPLTEYIQ 
TAASFSQTVS 
— VNYTDYIQ 



YY. . .GGRLL 
YF . . . DDQQV 
HTTSPTQRHL 
RY. . .DOAHL 
. .QDKTGFHF 
-Y G-HL 



PICLPEENQV 
PICLPBENQV 
PICLPSKODR 
PVCLPAAGOA 
AVCLPSASDD 
PICLP Q- 



CGASLVSSDW 
CQASLVSREW 
COGSIIGNQW 
COGSLLSGDW 
CGGSLINENW 
OGGSL-S-DW 



PPPG RBCS IA 
FPPGRICSIA 
NVIYTDCWVT 
LVDGKICTVT 
PAAGTTCVTT 
P--G — C--T 



LVSAAHCVYG 
LVSAAHCVYG 
ILTAAHCFYG 
VLTAAHCFPE 
WTAAHC, , . 
WTAAHC-YG 



GWGTWY.QG 
GWGALIY.QG 
GWGYRKL.RD 
GMGNTQY.YG 
GWGLTRYTNA 
GWG Y--G 



RNLEPSKWTA 
RNMBPSKWKA 
. .VBSPKILR 
RNRVLSRWRV 
. . .GVTTSDV 
RN-E-SKW— 



TTANILQEAD 
STADVLQBAD 
KIQNTLQKAK 
QQACVLQEAR 
NTPDRLQQAS 
-TA-VLQEA- 



ILGLHMKSBLJSPQTVPRLI DEIVINPHY NRRRK 

VLGI*HMASBIlJSPQIETRH DQIVINPHY NKRRK 

WSQI LMQg E IKEDTSFFGV QEIIIHDQY KMABS 

FAO. . .AVAQ ASPHGLQLGV QAWYHGCYL PFRDPNSEEN 

WAGEFDQGS SSEKIQKLKI AKVFKNSKY NSLTI 

V-G— M -SP L-I --IVIN--Y- N 



669 



VPLLSNERCQ 
VPLLSNEKCQ 
IPLVTNEECQ 
VPIISNDVCN 
LPLLSNTNC, 
VPLLSNE-CQ 



.QQKPSYHXZ 
.QQKPEYHXT 

. KRYRGHKIT 
GADFYCaiQIK 
.KKYWOTKIK 
Y-G— IT 



ENHICAOYBE 
QOCVCAGYEA 
HKMICAGYRE 
PKMFCAOYPE 
DAMICAO. .A 
E-HICAGY-E 



OGIDSCQGDS 
OGVDSCQGDS 
GOK0ACKGDS 
GOIDACQGDS 
9GVSSCMGDS 
GG-DSCQGDS 
» • 



987 



EKhu GGPLMCQEN. . . .NRWFLAG VTSFGYK.CA LPNRPGVYAR VSRPTEWIQS FLH 

EKbov GGPLMCQQI. .. .NRWLLAG VTSPGYQ.CA LPNRPGVYAR VPRFTEWIQS FLH 1035 

PXI GGPLSCKHN. . . . EVWHLVG ITSWGEG.CA QRERPGVYTN WBYVDWILE KTQAV 

Heps in OOPPVCBDSI SRTPRWRLOO IVSWCTG.CA LAQKPOVYTK VSDPREWIFQ AIKTHSBASO MVTQL 

Chta GGPLVCKKN. ...GAWTLVG IVSV#GSSTCS .TSTPGVYAR VTALVNWVQQ TLAAN 

Consensus CGPL-C-EN- RW-L-G ITSWG CA L--RPGVYAR V— F-EWIQ- -L 

♦-d- * 

Fig 1 Translated amino acid sequence of enterokinase cDNA clones and alignment with other serine proteases. The aligned sequences 
include human enterokinase (EKhu). bovine enterokinase (EKbov). human factor XI (FXI). human hcpsin (Hepsin). bovine chymotrypsinogen 
A (Chta). and a consensus sequence. Numbering at right refers to the tiansUted sequence of bovine enterokinase. Cysteine rcsiducswe m 
boldface type. Potential N-linked glycosylation sites arc in boWface underlined type. The potential signal-anchor sequence is double undnhned. 
The potential alternative cxon is indicated by a dotted underline. Sequence motifs in the heavy chain are indicated by numbered undeilmcs. 
Segments of the prx>tease domain that may interact with substrate amino acids are indicated by lettered underlines (a-d). The cleavage site for 
zymogen acUvation (A), active site residues (♦), and residues in the specificity pocket or SI subsite (*) arc indicated below the consensus 
sequence. 
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YMYGENWKL SINISSDQNM EKT IF QKEGNYGQNW NYGQVTUIET VEFKVSFYGF . . . KNQILSD lALDDISL. . . .TYGICNV 

HMDGSHVGTli SIKLKYEMEE DFDQTL. . .W TVSGNQGDQW KEARWLHKT MKQYQVIVEG TVGKG.SAGG lAVDDIIIAN HISPSQCRA 
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EK3 (620-651) DV FSTTNRMTVL FITDNMIAKQ 

ToHoid2 (551-581) NI KTRSNQMYIR FVSDSSVQKL 

ToHoid3 (713-743) W NSEQSILRLE FYSDRTVQRS 

Tolloid4 (869-899) AV lASTNEMPMV LATDAGLQRK 

Clrl (96-135) LGNPPGKKEF KSQGNKMLLT FHTDPSNEEN 

Clr2 (275-306) DL DTSSNAVDLL FFTDESGDSR 

Consensus DV -S— N-M-L- F-TD-S-Q-- 
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Fio 2 Stnictural motifs in enterokinase. Numbers in parentheses refer to the amino acid residues represented m each ahjpcd s^uence. 
Bovine enterokinase (EK) residues arc numbered as in Fig. 1. (A) Schematic structure of enterokinase, mdicatmg the proposed signaJ-anchor 
seaucnce (SA). alternative exon (AE). numbered heavy chain domains (LDLR, low-density-Upoprotcin receptor; MSCR. naacrophagc scavenger 
^^y^l^^^it^ domain with active-site residues histidine (H), aspartate (D). and serine (S). The cleavage site between the heay 
and Ught Chains (arrowhead) and disulfide bond connecting them are shown. (B) Alignment of EK domains 1 and 4 with cysteme-nch motifs 
of the LDL receptor (LDLR) (35). (C) Alignment of EK domain 2 with segments of Xenopus laevh AS antigen (A5xcn) (36), mouse rn^nn A 
(37) and rat mcprin B (38). (D) Alignment of EK domain 3 with selected Clr/s-likc domams of Drosophila melanogaster toUoid (39). and 
complement component Clr (40). (£) Alignment of EK domain 5 with repeated domains of the mouse macrophage scavenger receptor type I 
fMSCR) (41) and the speract crosslinking protein from sea urehin sperm (42). The significance of alignments was estunated as descnb^ under 
Materials and Meihodr, EKl or EK4 versus LDLR motifs, P < 10""; EK2 versus meprin motifs, P < lO'"; EK3 versus Clr/s motifs, P < 
10-"; EK5 versus MSCR motifs, /» « 3.7 x lO"'. 



7592 Biochemistry: Kitamoto et al. 



Proc, Natl. Acad, Set USA 91 (1994) 



protein families, indicating that enterokinase is a mosaic 
protein with a complex evolutionary history. The particular 
combination of motifs is specific and surprising (Fig. 2 A), 
Enterokinase domains 1 and 4 are homologous to an ^AO- 
amino acid cysteine-rich repeat found in the amino-tenninal 
domain of the low-density lipoprotein receptor and also in 
several complement proteins (Fig. 2B) (35). 

Enterokinase domain 2 (Fig. 2C) is homologous to ^110- 
amino acid segments of meprins A and B, which are mem- 
brane-bound metalloproteases of renal glomeruli (37, 38). 
This domain also is homologous to a segment of the A5 
protein of X, laevis (36), which may mediate neuronal rec- 
ognition. For this structtiral motif, identified in four distinct 
vertebrate proteins, we propose the name **meprin domain.** 

Enterokinase domain 3 (Fig. 2D) is homologous to a family 
of *»120-amino acid repeats reported in complement serine 
protease Clr (40) and subsequently found in many proteins 
including the product of the Drosophtla dorsal-ventral pat- 
terning gene toUoid (39). Interestingly, tolloid also encodes a 
separate metalloprotease domain that is homologous to the 
metalloprotease domains of meprins A and B. 

Enterokinase domain 5 (Fig. IE) is homologous to «»110- 
amino acid cysteine-rich motifs that are foimd in the macro- 
phage scavenger receptor (41), the sea urchin spermatozoa 
speract receptor (42), and several lymphocyte cell-surface 
antigens (41). This domain in enterokinase is truncated at the 
carboxyl end. 

The structural domains of the enterokinase heavy chain are 
found in proteins of the complement cascade, in endocytic 
receptors for diverse ligands including lipoproteins, in pro- 
teins that regulate development, in receptors that contribute 
to the specificity of egg fertilization, and in proteins of 
unknown function. The particular combination of structural 
motifs observed in the enterokinase heavy chain is unprec- 
edented. The presence of potential ligand-binding domains 
suggests that interaction with other macromolecules, either 
in the cell membrane or in the lumen of the gut, might 
modulate enterokinase activation, substrate specificity, or 
inhibition. 

For nearly a century enterokinase has been known as the 
principal activator of digestive hydrolases, and the same 
basic regulatory mechanism appears to be conserved among 
all vertebrates. The physiologic importance of this mecha- 
nism is emphasized by the severe malabsorption that accom- 
panies human enterokinase deficiency (3, 4). The apparent 
requirement for proteolytic activation of proenterokinase 
suggests that yet another protease is required for the normal 
regulation of pancreatic zymogens. The isolation of cDNA 
clones for human and bovine enterokinase provides the 
means to address the regulation and structure-function re- 
lationships of this ancient, essential protease. 
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abstract: Enterokinase is a serine protease of the duodenal brush border membrane that cleaves 
trypsinogen and produces active trypsin, thereby leading to the activation of many pancreatic digestive 
enzymes Overlapping cDNA clones that encode the complete human enterokinase amino acid sequence 
were isolated from a human intestine cDNA library. Starting from the first ATG codon, the composite 
3696 nt cDNA sequence contains an open reading frame of 3057 nt that encodes a 784 ammo acid heavy 
chain followed by a 235 amino acid light chain; the two chains are linked by at least one disulfide bond. 
The heavy chain contains a potential N-terminal myristoylaUon site, a potential signal anchor sequence 
near the amino terminus, and six structural motifs that are found in otherwise unrelated proteins. These 
domains resemble motifs of the LDL receptor (two copies), complement component Clr (two copies), 
the metalloprotease meprin (one copy), and the macrophage scavenger receptor (one copy). The 
enterokinase light chain is homologous to the trypsin-like serine proteinases. These structural features 
are conserved among human, bovine, and porcine enterokinase. By Northern blotting, a 4.4 kb enterokinase 
mRNA was detected only in small intestine. The enterokinase gene was localized to human chromosome 
21q21 by fluorescence in situ hybridization. 
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Enterokinase was discovered by N. P. Schepovalnikov, 
in 1. P. Pavlov's laboratory, as an activity of small intestinal 
mucosa that dramatically increased the proteolytic activity 
of pancreatic fluid (Pavlov, 1902). Enterokinase later was 
shown to be an enzyme (Kunitz, 1939) that cleaves the 
amino-terminal activation peptide firom trypsinogen to pro- 
duce trypsin (Davie & Neurath, 1955; Yamashina, 1956). 
This reaction permits the subsequent activation of other 
pancreatic zymogens by trypsin. The physiologic importance 
of this two-step proteolytic cascade is indicated by the 
intestinal malabsorption that is caused by congenital defi- 
ciency of enterokinase (Hadom et al., 1969; Ha worth et al., 
1971). 

Enterokinase has been piuified from bovine (Anderson el 
al.. 1977; Uepnieks & Light, 1979; Fonseca & Light, 1983), 
porcine (Baratti et al., 1973), human (Magee et al.. 1981), 
and ostrich intestine (Naude et al., 1993). In most prepara- 
tions, enterokinase appears to be a disulflde-Linked het- 
erodimer composed of an 82—140 kDa heavy chain and a 
35—62 kpa light chain, although a trimeric structure also 
has been proposed for human (Magee et al., 1981) and 
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porcine (Matsushima et al., 1994) enterokinase. Both chains 
of mammalian enterokinases contain 30—50% carbohydrate. 

Recently, the full-length amino acid sequences of bovine 
(LaVallie et al„ 1993; Kitamoto et aJ., 1994) and porcine 
(Matsushima et al.. 1994) enterokinase and a partial sequence 
of human enterokinase (Kitamoto et al., 1994) were deter- 
mined indirectly by cDNA cloning. Active enterokinase 
appears to be a two-chain protein derived from a single- 
chain precursor. The puUtive heavy chain contains a 
hydrophobic potential signal-anchor sequence near the amino 
terminus, as well as several domains that are homologous 
to structural motifs found in other proteins. The light chain 
contains the catalytic center, and cnteroldnase is a member 
of the tiypsin-like family of serine proteases. 

Many facts remain unknown concerning the structure and 
function of enterokinase. Although enterokinase appears to 
be an intrinsic membrane protein, the mechanism of mern- 
brane association is unknown. Furthermore, single-chain 
proenterokinase is proteolytically cleaved to generate active 
two-chain enterokinase, but the enzyme that is responsible 
for proenterokinase acdvation has not been identified. 

To facilitate the study of human enterokinase membrane 
localization and zymogen activation, we have characterized 
cDNA clones that encode the complete amino acid sequence 
of human proenterokinase. These clones were employed to 
localize the himian enterokinase gene to human chromosome 
21q21 by fluorescence in situ hybridization. 

EXPERIMENTAL PROCEDURES 

Isolation ofcDNA Clones. The partial human enterokinase 
cDNA insert contained in plasm id pH EK6 (Kitamoto et aL; -'ll 
1994) was labeled with [^^PjdCTP by a random primer - 
method (Feinberg & Vogelstein, 1983) and employed tos^ 
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Figure 1: Donoain structure of human enterokinase and map of enterokinase cDNA clones. The structure of the enterokinase cDNA is 
indicated schematically at the top. The 5' and 3' untranslated regions are indicated by thin lines ( — ) at the extreme left and right ends. The 
locations are indicated for a proposed signal-anchor domain (SA) and serine protease domain with active site histidine (H), aspartate (D), 
and serine (S) residues. The locations arc shown of the cleavage site between the heavy and light chains (arrowhead) and of the predicted 
disulfide bond that connects them. The enterokinase heavy chain contains repeated motifs (numbered 1 —6) that are homologous to domains 
of other proteins: LDLK, a low-density lipoprotein receptor cysteine-rich repeat (Sudhof ct al., 1985); Clr/s, a repeat type found in complement 
components Clr and Cls (Leytus et al.. 1986) and also found in the Orosophila dorsal*— ventral patterning gene toUoid iSUxmeW et al., 
1991): MAM, a domain homologous to members of a family defined by motifs in the mammalian metalloprotease meprin. the X. laevis 
neuronal protein A5. and the protein tyrosine phosphatase^ (Beckmann Sl Bork, 1993); MSCR* macrophage scavenger receptor cysteine- 
rich motif (Freeman et al., 1990) also found in sea urchin spermatozoa speract receptor (Dangott et al., 1989). The relationships among 
eight ovedapping cDNA clones are indicated. The scale in Idlbbases (kb) of DNA is indicated at the bottom left 



screen a human small intestine cDNA library in the bacte- 
riophage ^gtll vector (Clontech). The cDNA inserts of 
plaque-purified isolates were subcloned into plasmid pBlue- 
script M13-f or pBluescript II KS+ (Stratagene) for DNA 
sequencing (Sanger et al., 1977). 

DNA Sequence Analysis, Sequences were compared to 
GenBank and EMBL data bases at the National Center for 
Biotechnology Information using the BLAST network server 
(Gish & States, 1993). Sequence alignments and consensus 
sequences were prepared and analyzed with the programs 
pileup, gap. and pretty of the" Genetics Computer Group 
(version 7.1, Madison, WI) as described previously (Kita- 
moto et al., 1994). 

Northern Blotting. The insert of human enterokinase 
cDNA clone HEKI or human /?-actin (Gunning et al., 1983) 
was labeled with I^^ldCTP (Feinberg & Vogelstein, 1983). 
A Northern blot of poly(A)4- RNA (Clontech) from assorted 
human tissues (2 /ig/Iane) was hybridized (Sambrook et al., 
1989) with the radiolabeled HEKI insert (1 x 10^ cpm/mL) 
and washed three times for IS min at room temperature in 
2 X SSC and 0.05% SDS (1 x SSC is 15 mM sodium 
citrate, pH 7.0, 0.15 M NaCl). The final stringent wash 
condition was 50 °C, 15 min, in 0.1 x SSC and 0.1% SDS. 
The blot was exposed to X-ray film for 10 days. The blot 
was stripped of radiolabeled HEKI by immersion in 0.5% 
SDS for 10 min at 100 .**C. The strippeid blot was hybridized 
with the radiolabeled ^-actin probe, washed as described 
above, and exposed to X-ray film for 2 h. 

Gene Mapping by in Situ Hybridization, Fluorescence in 
situ hybridization was performed as described (Lichter et 
al,. 1988). Human prometaphase chromosome spreads w6re 
prepared from cultured phytohemagglutinio-stimulated pe- 
ripheral blood leukocytes firom a male with a normal 
karyotype (46XY). Extended chromosomes were produced 



* Abbreviations: kb, kilobase; nt. nucleotide; SSC, standard saline 
^itrate (IS mM sodium'cttrate. pH 7.0, 0.15 M NaCI); 'SDS, sodium 
'^^.'.dodecyl sulfate. .... 



by colchicine treatment (Yunis, 1976). Plasmids pHEKl and 
pHEK6 contain the human enterokinase cDNA inserts of 
bacteriophage Agtl 1 isolates HEKI arid HEK6, respectively, 
cloned into plasmid pBluescript Ml 3+. Equal amounts were 
mixed of pHEKl and pHEK6, and ftJl50 ng of DNA was 
labeled with biotin-1 1-dUTP by nick translation (Rigby et 
al., 1977). The biotinylated product was hybridized to human 
chromosomal spreads (Lichter et al., 1988). To detect sites 
of hybridization, slides were incubated sequentially with 
fluorescein isothiocyanate-conjugated avidiri DCS (5 ^g/mL) 
and fluorescein isothiocyanate-conjugated goat anti-avidin 
D antibodies (5 ^g/mL), followed by counterstaining with 
4,6-diamino-2-phenylindole dihydlrochloride (200 ng/mL) 
and propidium iodide (200 ng/mL). After fluorescent 
hybridization, cytogenetic banding patterns were visualized 
by staining with Giemsa. 

RESULTS AND DISCUSSION 

Isolation qfcDNA Clones. A human small intestine Agtl 1 
cDNA Ubrary was screened with the insert of a pardal human 
enterokinase cDNA clone, HEK6 (Kitamoto et al., 1994). 
Seven positives were identified among 1.5 x 10* plaques 
screened. Clones HEK12, HEKI8, and HEK19a were 
characterized further by restiicdon mapping and sequencing 
(Figure 1). The cDNA insert of HEK19a was employed to 
rescreen the library, and the longest clone obtained (HEK27) 
was sequenced. 

The composite cDNA sequence of human enterokinase 
(Figure 2) was determined on both strands. Beginning at nt 
41 there is an ATC codon and open reading fiame of 3057 
nt, followed by a stop codon and 3' noncoding region of 
599 nt. The open reading frame encodes a polypeptide of 
1019 amino acids with a calculated mass of 1 12.9 kDa. The 
coding regions of the human and bovine (Kitamoto et al., 
1994) nucleotide sequences are ^85% identical,- and the 
encoded amino acid sequences are ^82% identical. The 3' 
npDCoding regions are -less conserved, .with rs67% identity 
between human and bovine enterokinase cDNA sequences 
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S S P 
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Z H P H 

ACX^GGAAGAA 
PEE 

GCTGATGTTC 
A D V 



ATOGCCTTCOA 
N G S 

GATTAATTCC 
G L X . A 

ACTTACATAT 
• V T Y 

AATCTGAAGA 
N L K 

AAAATGTAAA 
O N ' V K 

CKTAACAACC 
L T T 

GATTTATTTT 
D L F 

CATCTGCGTC 
S S C S 

GAGCTTCGAT 
S F D 

CCTGGCACAA 
P C T 

TTAACAGCAC 
P N 5 S 

AAGOATTCAG 
R I Q 

CGAGCGAGAC 
G G R 

TCCATAAATT 
V H K L 

AACCCTAAAT 
T L H 



ATTPGCAATG 
X C H 

ATACAACATT 
N T T F 

TTTTCAAGAA 
FOE 

CCAGTAAAGG 
P V K 

GCTATCACTT 
G Y H L 

CTGTGAGCAT 
C B D 

ACACCTTGTO 
T A C 

ATCCTGGACC 
O G G P 

TAACCATAAA 
N H K 

GCTCTGTATT 
C L Y 

CATCCAACTG 
P S K W 

TTACAATAGG 
Y N R 

AATCAAOTTT 
N Q V 



TACATTCTTC 
I O S C 

GCCTAATCGC 
P N R 

ATTATTTTCC 
AATCCTAGGG 
TGTGAACAGO 
CATACACTTA 
ACTTCCACTT 
CAATAGAAAC 



CTCTTCTATC 
P L L S 

TCAGCGCGAT 
O O D - 

CCCCGAGTGT 
P G V 

CATTCTACrrC 
GGCXAGCGAA 
TATTTCTTCA 
ACAAATTTGA 
T A G ' A ' J ' i O CTG 
TATTTATTXIT 



AAACACCCAT ATCTTCTACC 
K R G X S S R 

ACTATCCTOC CTGACAATCA 
V S C L T I 

AATCCTAATT TGCAAGACAA 
N P N LOOK 

ATCAATATAA QAACTCAAGA 
N B Y K HSR 

AGAAGAACT6 ATTCAAGOCC 
BEL I O G 

ACCAGTCATC TGGCAACTCC 
T S H L A T P 

GTGATGGAGA AGTAAACTGT 
C D G E V N C 

TTTCCACCCT ACTCATTATC 
F O A THY 

GATTTTAATA CATATTATAC 
D F N T Y Y T 

TAAGAATTTT TTCCAACCAA 
I R I P S N O 

TGAGCTTAAT AArTATGAOA 
£ L H N Y B 

GGAAGCACCT TTTCTCCTTT 
GST F S P P 

AACAACGACT CGGGCTTTTA 
O E R V G L L 

AACCATTAAT ATCACCAATG 
SIN I S N 

CAAACAGTTA AATTTAAGGT 
E T V K F K V 

GOAGTCTTTA TCCACAACCA 
C S L Y PEP 

CAOTTCTACG AACTTTCCAA 
S S T N F P 

TTTGACTTAG AAAATATTAA 
F O L B N I N 

ATGTGTTCTC TACCACCAAC 
D V F S T T N 

COOGATTCCA GAGCCATGCA 
GXP E P C 

COCTCAGATG AACCAGATTC 
G S D E A D C 

CTCACAACTC GACCACCCAG 
A E N V/ T T Q 

ATTTGTCAAA TTAAACACAG 
F V K ' L N T 

TCTTCTGGAA AAAAACTGGC 
S C G K K L A 

ATCGCGGCCG ACTCCTCTGC 
Y C G R LLC 

GACAGCAATC CTACGCCTGC 
TAX L G L 

CGAAGAAAGG ACAACGACAT 
R R K D N D X 

TTCCTCCAGG AAGAAATTGT 
F P P G R N C 

AAATGAGAGA TGCCAACAGC 
N E R COO 

TCAGGAGCAC CATTAATGTO 
S C G P L M C 



CATCATTCTC 
H H S 

AOGAATCCCA 
K B 5 0 

ACTCTCACTG 

L S V . 

CnTTTACAAT 
V L Q 

TTGAAOCAAA 
LEAN 

AGGAAATGTC 
G H V 

CCAGATCGTT 
P D G 

CAAAACCTTC 
P K. P S 

AOATATATTA 
OIL 

CTTACTGCCA 
VTA 

AAATTAATTO 
R X H C 

TACTGGACCC 
T G P 

AGCCTCCCTT 
S L P 

ACCAAAATAT 
O O N M 



TCACCTCXTTA 
L S S Y 

acgagotoca 

RCA 

CATTTCAAAG 
D P X 

TTGAAAATGO 
FENG 

TAAATCCAGC 
K S S 

TCAATAGAGT 
S I E 

CTGACGAAGA 
S D E O- 

TGAAACAAGT 
E T 5 

GATATTTATO 
D I • Y 

CCTTTCTTAT 
T F L X 

TAACTTTGAO 
N F E 

AATTTTGACC 
H P O 

TGCACCCCAC 
L D P T 

GCACAAGACA 
E K T 



ATGCCAGCGT CTCAAOGTTT 
Y A R V S R F 

TAGAAACCAT GGAAATTAAG 
ACAA AATT TT AAAA ATAATA 
CAGATCTCAT TTTTAAAATT 
GCAGAATTTA AAAAAGAAAG 
CTATTAGCAC AAACTCAATT 
AAGCTTATCT CACAGGCCTG 



TGCTTTTAAT 
A F N 

ACTTTX3GTGC 
T L V 

ACAGCTACCC 
N S Y P 

CGATGTAGTT 
D V V 

ACAATGACTG 
R M T 

ACOCAGACCA 
K A D H 

TGTCCGTTTT 
V R F 

ATPTCAAATC 
I S N 

CACCTGATGG 
A P D G 

AGCTCAAGAC 
A Q O 

CGCCCATCTC 
GAS 

ATATGAAATC 
H H K S 

TCCCATCATQ 
A N H 

TCTATTOCTG 
6 I A 

ACATOCCAGA 
0 M P E 

CCAAGAAAAC 
O E N 

ACCCAATGCA 
TEW 

TGTTTCGTAC 
AAATTCACCA 
CTTAA'PGATT 

orn^CTCTc'* 

GACTAAA'ITU 



GCTTTTAAAA 
A F K 

CAACTCCTCC 
P T P P 

TAATCTGGCT 
NLA 

GAAATAAGAG 
E I R 

TGCTTCTCAT 
V L L I 

TTTTCAATCT 
F O C 

TTCAATCCCA 
F N G 

ATCTTTCTCA 
D V C Q 

CCACTTAATA 
H L t 

ATCACCCCAA 
I T P 

TCCTCAGCAG 
L V S S 

AAATCTGACC 
H L T 

CATCTGGAAT 
H L' B 

CTTGCGGGAC 
G W G T 

ATATAACATT 
Y N I 

AACAGGTGGT 
N R H 

TACAAACTTT 
Z O S F 

AAAAATTTTA 
TAGCAATACA 
ATTTTTATTA 
GTTTTTCCCA 
CTTTTCTATC 
ATrTTACGTT 



TGAAATCATC 
E Z H 

CCACTTOGAC 
A I* C 

TTCTTGCTTT 
V L A P 

CAGCATTATA 
SIX 

CAACTGGTCA 
O L V 

GOCTCCCTG6 
C L P G 

CAATAAAATG 
N K H 

G 'l'i Vl tl TCC C 

V V C 

AAGGTCTTAGG 
E G V G 

AGAATCTGAT 
E S D 

GATGGCTTTT 
D G F 

ACACTTTTCG 
H T F G 

TTTGGAGCCA 
L B P 

CTTTTCCAAA 

V F 0 

ACAAGATCCT 
N K X L 

ACCAGAACTT 
PEL 

TTCTCTCTTT 
r C V 

ATGGTGAACA 
OGEE 

CACTAACGAT 
T N D 

AAAAATGGAG 
K N G 

CAACGAACAA 
T T N H 

ACTGCTGCGA 
L L G 

CTAACACCCA 
L T P 

AGATTCrTTGC 
K X V G 

TGACTOGCTC 
OWL 



TCTCCTCAAA 

s p o- 

TTAAACTTCAA 
F K V M 

GGTTGTATAT 
V V Y 

ACTGAAAATA 
TEN 

FLAG 

TCTACATTAG 
L H * 

AAAACT TACC 
GAATAACTTT 
CTTAC TCTT C 
AAGTATGTCA 
AAAATTTTCA 
CCTCTT 
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Figure 2: Nucleotide and translated amino acid sequence of human enterokinase. Numbering at the right indicates the nucleotide or amino 
acid residue at the end of each line. Amino acids arc shown in single-letter code. The tennination codon is shown by an asterisk (*). The 
sequences contained in individual cDNA clones areas follows: HEK27, ml— 2362; HEK 19a. nt 948-^2139; HEKlS. ot 1451— 2788; 
HEK12. nt 1591-3045; HEK6. nt 1762-2714; HEK3. nt 2278-2714; HEKl. nt 2454-3668;.HEK5,.nt 2511-3969. . . , 
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Figure 3 Alignment of human (Hek), bovine CBek) (Kitamoto et al.. 1994), and porcine (Pek) (Maisushima et ol.. 1994) enlerokinasc 
ammo acid sequences. Amino acids arc shown in single-letter code. Residues thai arc identical in all three species are in capital letters; 
unconscrved residues are in lower case. Numbering at the right refers to the translated amino acid sequence of each species of enterokinase 
Cysteine residues are in boldface type. Potential N-linked glycosylaiion sites are in boldface undcriined type. The potential signal anchor 
>cquence is double underlined. The location of a potential alternatively spliced cxon in bovine enterokinase is indicated by a doited underline 
J his segment is notably variable among ihfc aligned species. Sequence motifs in the heavy chain are indicated by numbered underlines that 
correspond to the domains shown in Figure I. 



over 599 nt. A similar degree of sequence identity is 
apparent when either the human or bovine enterokinase 
sequences are compared to the porcine enterokinase cDNA 
sequence (Matsushima et al., 1994). 

Structural Features of Human Enterokinase. Most struc- 
tural elements of human enterokinase arc highly conserved 
'^Figure 3). The similarities among the human, bovine, and 
porcine enterokinase sequences suggest that the mature 
proteins consist of two polypeptide chains derived by 
processing of a single-chain precursor. A potential myris- 
toylation site is present at Gly2 (Rudinick et aJ., 1993). 
Amino acid residues 19—43 are hydrophobic and may 
constitute a signal-anchor sequence. The putative heavy 
chain contains six sequence motifs that appear to be 
homologous to four types of domains found in other proteins 
(Figure 4). As reported previously (Kitamoto et al., 1994), 
the cleavage site after Lys784 separates the heavy and light 
chains of enterokinase. and the light chain is homologous to 
■• the trypsin-like family of serine proteases. In all three cloned 
enterokinases. the sequence surrounding this cleavage site 
is consistent with the known substrate specificity of trypsin. 
V:- Enterokinase domains 1 and 5 are homologous to cysteine- 
liich repeats in the low-density lipoprotein receptor (Sudhof 



et al.. 1985); domain 6 is homologous to a segment of the 
macrophage scavenger receptor (Freeman et al., 1990), as 
reported previously (Kitamoto el al.. 1994). 

During the analysis of the bovine enterokinase sequence 
(Kitamoto el al., 1994) domain 4 was recognized as a 
member of a sequence family that includes two motifs 
identified first in complement component Clr (Leytus et al.. 
1986). Domain 2 of porcine enterokinase then was found 
to belong to the same sequence family (Matsushima et al.. 
1994). As indicated in Figures 3 and 4, two Clr/s domains 
clearly are present in human, bovine, and porcine enteroki- 
nase. . 

Domain 3'of bovine enterokinase (Kitamoto et al.. 1994) 
was recognized as homologous to segments of the metallo- 
proteases meprin A (Jiang et al.. 1992) and mcprin B 
(Johnson & Hersh. 1992) and to a domain of the A5 protein 
of Xenopus /acv/j (Takagi et al„ 1991). The name "meprin 
domain" was suggested for this motif (Kitamoto et al.. 1994). 
However, a previous report had described the Same motif in 
. meprins, the Xenopus A5 protein, and in the extracellular 
domain of receptor protein tyrosine phosphatase (Gebbink 
et al.,"1991); the name **MAM" domain was proposed (for 
"meprin". "A5*V and "mu") (Beckmann & Bork. 1993). The 
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1 and 3. Domains Hek-2 and Hek-4 are aligned ^J^^^^^^ ^^^^^ WsS'rte s^nificance of gap alignments was evaluated by 
(ShimeU et al.. 1991) and fton. ^^P^'"" ~""P°"';"' obLun^ for 3oTgnments of SSdomized sequences, using the 

recently cloned receptor protein tyrosine phosphatase k also 
contains a MAM domain (Jiang el al., 1993). 

The function of the enterokinase heavy-chain domains is 
unknown. Related domains in other proteins appear to bind 
ligands or mediate protein-protein interactions. For ex- 
ample, the a-subunit of mouse meprin A associates with the 
^-subunit, possibly through MAM domains in each subunit. 
This association is required for membrane local izaUon of 
the mature a-subunit, which lacks a membrane-spanning 
domain (Marchant et al., 1994). Thus, the MAM domain 
of enterokinase could interact with other proteins that 
contribute to membrane localization or enzyme activity. A 
role for the heavy chain in determining substrate specificity 
would be consistent with the reported ability of heating 
(Bams & Elmslie. 1974; Anderson et al.. 1977), acetylation 
of amino groups (Baratti & Maroux. 1976). or dissociation 
of the light chain by partial reduction (Light & Fonseca, 
1984) to selectively impair enterokinase activity toward 
trypsinogen without markedly affecting activity toward small 
amides or esters. 

A few segments of the enterokinase heavy chain show a 
notable lack of sequence conservation. A potential alterna- 
tively spliced sequence of 81 nt was idenUfied in several 
bovine enterokinase cDNA clones (Kitamoto et al., 1994) 
and was present in porcine enterokinase (Matsushima et al., 
1994). This segment overlaps with a 45 nt deletion in human 
enterokinase that shortens the heavy chain by 15 amino acids 
and deletes one potential N-Unked glycosylaUon site (Figure 
3). suggesting that this region may tolerate some variation 
in length. This variable segment is rich in hydroxyamino 
acids, especiaUy in porcine enterokinase for which 13 of these 
27 amino acids are serine or threonine (Matsushima et al.. 
1994). Because of its striking amino acid composition, this 
segment was suggested as a possible site of O-linked 
glycosylation (Matsushima et al., 1994). although direct 
evidence for this modification has not been reported. In 
human fenterokinase, this segment contains only four hy- 
droxyamino acids (Figure' 3). 

Human and porcine enterokinase also lack one amino acid 
residue thai is found in bovine enterokinase domain 2 (Figure 
3); this deletion removes two possible concatenated N-linked 
glycosylation sites. Several additional glycosylation sites are 
not conserved, so that human, bovine, and porcine enter- 
okinase "heavy* chains have 14," "17, and 18 potential *N- 
lly cosy latibh sites, rsspectively ." 
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FiGURE 5: Expression of enterokinase in human tissues. A Norxhem 
blQi of human poly(A)+ RNA from assorted human tissues (2 //g/ 
lane) was hybridized with radiolabeled cDNA probes as descnbcd 
under Experimental Procedures. The upper panclshows hybndua- 
tion with an enterokinase cDNA probe derived f^m clone HE Ki. 
exposed to X-iay film for 10 days. The lower p«ncl sbo>^ f^c ^e 
bloTafter being stripped and rehybridized with human /'-acun cDNA 
probe. ocposSi for 2 h. The mobility of RNA size standards is 
indicated at the left. 

Tissue Distribution of Enterokinase mRNA. By Northern 
blotting of human poly(A)4- RNA. an =^.4 kb mRNA for 
enterokinase was detected in small intestine. No expression 
was observed in leukocytes, colon, ovary, testis, prostate, 
thymus, spleen, pancreas, kidney, skeletal muscle, Uvcr, l«n&-f 
placenta, brain, or heart (Figure 5). A band of similar siz^ . | 
was detected by Northern blotting of RNA .'.1 
duodenum with a bovine enterokinase cDNA probe Cof^ j^ 
not shown). These results are consistent witii ^^^^jf^^^^^.^l^vIM 
of enterokinase activity (Pavlov, 1902; Lojda & Malis. 
and antigen (Miyoshi et al., 1990) almost exclusively 
enterocytes of proximal small intestine. -V^ 
Chromosome Localization of the Human Enterokinase^ 
Gene, Huorescent in situ hybridization was used to ph3^ 
cally localize the human enterokinase gene. To 
adequate hybridization signal, the inserts of cDNA clqp^ 
HEKl and HBK6 were mixed, thereby including «l-f;5S 
of the cPNA sequence. The DNA was labeled with bi?^ 
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FiGLUE 6: Fluorescent in situ hybridization localization of the enterokinase gene to human chromosome 2]q21. Five metaphase spreads are 
shown. Arrows indicate biotin-labeled probe hybridization (color) and the position of the same spreads banded using Gtemsa dye. Also 
Lq. shown is an idiogram of chromosome 21 with band q21. to which the probes hybridize, indicated by an arrowhead. 



-and hybridized to prometaphase spreads of human chromo- 
somes. Labeled DNA was detected with fluorescein isothio- 
• cyanate-conjugated avidin DCS and amplified with fluores- 
^ cein isothiocyanate-conjugated goat anti-avidin D antibodies, 
independent metaphase spreads were analyzed, and five 
^representative spreads are shown (Figure 6). Specific 
^>hybridization of the enterokinase cDNA probe was observed 
g.Po chromosome 21; no consistent secondary hybridization 
gvas detected, 4,6~Dianudino-2-phenylindole dihydrochloride 
ig and Giemsa banding confirmed the location of the 
lybridization signals on chromosome 21 band q21. 
|:The human enterokinase locus appears to be close to the 
i^^® for >9-amyloid precursor protein at 21q21.2 (Nizetic et 
1994), which is mutated in one form of inherited 



Alzheimer disease (Goate et al., 1991), and to the gene for 
superoxide dismutase at 21q22.1. which is mutated in familial 
amylotrophic lateral sclerosis (Rosen et al.» 1993). Enter- 
okinase also is in or near a region implicated in specific 
features of Down syndrome, although the precise locations 
of chix)mosome 21 segments that contribute to Down 
syndrome remain unknown (Korenberg et al.. 1994). The 
cloning of cDNA for human enterokinase will enable fine 
structure physical and genetic mapping of these loci and the 
characterization of mutations in congenital enterokinase 
deficiency (Hadom et 'al., 1969; Haworth et al.. 1971). These 
clones also facilitate the study of biosynthetic targeting to 
apical brush border meinbranes/ zymogen activation, and 
substrate specificity of human "enterokinase. 
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*^Ti"."u"^ «°'*«8 for a new serine protease (hepsin) have been 

S?^e oOmT 5?^"^ • P[«i«Jf«l f™"" "ver and hepatoma ceU liie mRNA. ^e totaUeSh 

of the cDNA IS approximately 1 .8 kilobases and includes a 5' untranslated region. 1251 nucleotide c^Se 
bv thTcDNA f^r" " 3' untranslated region, and a poly(A) taH. The am"no acW seq^n^ 

c«led by the cDNA for hepsin shoM« a high degree of identity to pancreatic trypsin and other serine piote^a 
pr«ent in plasnria. It also exhibits features characterisUc of zym^ens to seriV^ proteases in ttSut wn^ 
a cleavage site for protease activation and the highly conserved rigions surrounding^His aId Tnd 
IfJdr^n,?hl' In addition, hepsin lacll a typical aminc^t«^L s£^al p^Jude 

s^aS 8^«fH?!'.f f P™t«'n/«l«ence. however, revealed a very hydrophobic region of 27 Vmin^K 
starting 18 residues downstream from the apparent initiator Met. This region may serve as an intemTl 
signal sequence and a transmembrane domain. This putative transmembrane domain could b^ involved 
m anchoring hepsin to the eel membrane and orienting it in such a manner that its cXxyl terminus 
containing the catalytic domain, is extracellular. ».»ri«i*yi icrminus. 



M. 



any biological processes which require specific, limited 
proteolysis arc mediated by a member(s) of the serine protease 
family of proteolytic enzymes. These proteases exist as single- 
or two-chain zymogens that arc activated by specific and 
limited proteolytic cleavage (Ncurath & Walsh. 1976). They 
contain three principal active-site amino acids (His, Asp, and 
Ser) that participate in peptide bond hydrolysis (Blow ct al.. 
1969). In addition, they share considerable structural simi- 
larities in their caulytic chains. 

Among the best-studied serine proteases are those that are 
found in plasma. These enzymes arc involved in prxx»sscs such 
as blood coagulation (Davie et al.. 1979). fibrinolysis 
(Christman et al., 1977; Collen. 1980), and complement ac- 
tivation (Reid & Porter, 1981). The active form of most 
plasma serine proteases consists of two polypeptide chains held 
together by a disulfide bond(s). a highly conserved catalytic 
chain derived from the ca r boxy 1- terminal end of the precursor 
polypeptide, and a unique noncatalytic chain derived from the 
amijio-terminal portion of the polypeptide chain. The presence 
of a noncatalytic chain(s) distinguishes the plasma serine 
proteases from the digestive proteases of the pancreas. By 
mediating interactions with other proteins or surfaces, non- 
catalytic chains induencc the action of plasma serine proteases 
on their selected substrates. The biosynthesis of most.of the 
serine proteases present in plasma occure in the liver. Although 
at least 20 different serine proteases synthesized in the Jiver 
have been described thus far, it is quite likely that many more 
exist. 

Recent reports have identified a number of new serine 
proteases produced in different tissues and cell types. Cook 

^This work was supported in part by research grants (HL 1 69 1 9 and 
HL 31 51 1) and a postdoctoral fellowship (GM 09J 18 to S.P.L.) from 
the Nauonal Institutes of Health. 
'University of Washington. 

•Present address: Department of Biochemistry. Medical College of 
W^tsconsin. 8701 Watertown Plank Road, Milwaukee WI 53226 
• ZymoGenetics. Inc. 

^ Present address: Department of Human Genetics. 4708 Medical 
^109* IJn'vcmty of Michigan Medical School. Ann Arbor, MI 



et al. (1985, 1987) have described a cDNA coding for a new 
serine protease that is expressed during adipocyte differenti- 
ation. Gershenfeld and Wcissman (1986) and Lobe ct al. 
(1986) have cloned cDNAs coding for new serine proteases 
exprcjssed by cytotoxic T lymphocytes. Newly characterized 
proteins have aUo been isolated from cytotoxic T lymphocytes 
(Pasternack et al., 1986; Young et al., 1986; Masson & 
Tschopp, 1987), Uver (Tanaka et al., 1986). ovary (Eisenhauer 
& McDonald, 1986). pituitary gland (Cromlish ct al., 1986), 
embryo fibroblast cells (Billings et al.. 1987). seminal plasma 
(Watt ct al.. 1986). submaxillary gland (Lundgren et al., 
1984). and tumor cells (LaBombardi et al., 1983) that exhibit 
properties typical of serine proteases. Additional new proteases 
have been reported, but not all have been identified as be- 
longing to the serine protease family. Although the majority 
of serine proteases are synthesized with signal peptides that 
direct their secretion outside of the cell, some of the new serine 
proteases recently reported may be associated with cell mem- 
branes (LaBombardi et al.. 1983; Tanaka et al., 1986). 

As a general approach to isolaUng cDNAs coding for serine 
proteases synthesized in the liver, a strategy was chosen that 
involved screening a human liver cDNA library with a syn- 
thetic oligodeoxyhucleotide probe coding for a highly conserved 
amino acid sequence known to exist in a number of different 
serine proteases. In this manner, recombinant clones were 
isolated that contained cDNA inscrU coding for serine pro- 
teases synthesized in the liver, including human factor IX 
(Kurachi & Davie, 1982). prothrombin (Degen ct al., 1983) 
and complement Clr (Lcytus ct al., 1986a). In this paper,' 
we report the isolation and characterization of the cDNA 
coding for a new trypsin-like serine protease. This hepato- 
cyte-expressed protease has been called hepsin. 

Experimental Procedures 

DNA restriction endonucleases and DNA modification 
enzymes were purchased from Bethesda Research Laboratories 
or New England Biolabs. '^-Labeled nucleotides used in 
njck-translating cDNA fragments (Maniatis et al.. 1982) and 
S^-end-labeltng synthetic oligodeoxynuclcotides (Maxam & 
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Gilbert, 1980) were obtained from New England Nuclear. 
[a-^%]dATP and nonradioactive nucleotides used for DNA 
sequencing were products of Amersham and Pharmacia, re- 
spectively. A mixture of tetradecadeoxynucleotides (used to 
screen the plasmid cDNA library) was synthesized by P-L 
BiocbemicaU and contained the following sequence: 

G 

I 

S'C-C-A-G-C-G-C-A-G-A-A-C-A-T3' 



LEYTUS ET AL. JfcDNA 



I 



A cDNA library prepared from human liver mRNA was 
kindly provided by Drs. S. L. Woo and T. Chandra of the 
Baylor College of Medicine. The library contained cDNA 
inserted into the Psfl site of plasmid pBR322 (Chandra et a].. 
1983). In addition, a cDNA library prepared from human 
hepatoma cell line (Hep G2) mRNA was also used* This 
library contained cDNA inserted into the j^coRI site of bac- 
teriophage vector Xgtl I (Hagen el al., 1986). The plasmid 
library was prepared for colony hybridization (Gcrgen ct ah, 
1979) and the Xgtl 1 library for plaque hybridization (Benton 
& Davis, 1 977) according to established procedures. Hy- - 
bridization conditions using '^P-labeted synthetic oligo- 
deoxynudcotide and cDNA probes were the same as described 
previously (Leytus et al., 1986a). 

DNA from recombinant phage was prepared according to 
Maniatis et a 1. (1982) with minor modificiations (Leytus et 
al., 1986a). cDNA inserts were released from the recombinant 
phage DNA by digestion with ScoRl, and a selected number 
of these were then subcloned into the EcoKl site of a pUC 
plasmid vector (Vieira & Messing, 1982). Plasmid DNA was 
prepared by a modification of the alkaline extraction procedure 
of Birnboim and Doly (1979). essentially as described by 
Micard ct al. (1985). 

Selected fragments from restriction enzyme digests of re- 
combinant plasmids were subcloned into M13 bacteriophage 
vectors by the method of Messing (1983). These were then 
sequenced by the dideoxy chain terminator method of Sanger 
et al. (1977), employing the modifications described by Biggin 
et al, (1983). DNA sequences were analyzed by the computer 
program genepro (Version 4.0, Riverside Scientific Enter- 
prises, Seattle, WA). Protein sequences were also analyzed 
by using genepro and the computer programs search 
(Dayhoff. 1979) and align (Dayhoff, 1983). 

Results 

A plasmid cDNA library prepared from human liver 
mRNA and containing approximately 14000 recombinant 
colonies was screened with a mixture of synthetic tetradeca- 
deoxynucleotide sequences (Leytus et al., 1986a). These se- 
quences were complementary to the mRNA sequence coding 
for the amino acids Met-Phe-Cys-Ala-Gly. The sequence 
Met-X-Cys-Ala-Gly is highly conserved in many serine pro- 
teases and is found approximately IS amino Acids' prior to the 
active-site serine. Among the 3 1 strongly hybridizing clones 
that were initially identified. 14 contained cDNA inserts coding 
for prothrombin, 9 for Clr, 2 for factor IX, and 5 for an 
unidentified protein whose cDNA contained a single nucleotide 
mismatch with the hybridization probe (Leytus ct al.,' 1986a). 
The last clone (designated HUW1250) coded for a serine 
protease and has now been examined more extensively. 

By Southern transfer and hybridization analysis,. the site 
in HUWI250 responsible for hybridizins to the . synthetic 
ollgodeoxynucleotide probe was localized, and the nucleotide 
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nouRje 1: Restriction endonuclease map of the cDNA coding for 
human bepsin. The schematic representation of several of the cDNA 
inserts and a sununary of the straUgy used to sequence portions of 
these inserts are shown. The solid, open, and slashed regions represent 
5' untranslated^ coding, and 3' untianslated regions, respectively, within 
a cDNA insert. The stipled regions represent impropcrty spliced 
intronic sequence found in clones HepG2UWl7 and HepG2UW2. 
Arrows indicate the direction and extent of sequencing obtained from 
the M 13 subclones. The numbers at the 5' end of each insert refer 
to positions within the nucleotide sequence of the cDNA (sec Figure 
2). Sequencing strategy for the apparent intron fragments is not shovrn. 

sequence of this region was determined. A DNA sequence 
was found that matched perfectly with one of the sequences 
in the oligodeoxynucleotidc mixture used as a probe. Closely 
following the DNA sequence that coded for Met-Phe-Cys- 
Ala-Gly and in the same reading frame was an amino acid 
sequence of Gly-Asp-Ser-GIy-Gly-Pro. The latter amino acid 
sequence represents the most highly conserved region in serine 
proteases and contains the active-site Ser residue. Since the 
deduced amino acid sequence flanking this highly conserved 
region did not match with any known serine protease, it ap- 
peared that HUW1250 coded for a new serine protease. This 
new enzyme has be^n called hepsin. 

Following the sequencing strategy shown in Figure 1, the 
complete nucleotide sequence of HUW1250 was determined 
[nucleotides 585-1 783 (Figure 2)]. A number of other amino 
acid sequences that arc highly conserved in most serine pro- 
teases were also present in hepsin. These included an Arg; 
Ile-Val-GIy-Gly activation site region (residues 162-166), a 
Thr-Ala-Ala-His-Cys active-site His region (residues 
20O-204), an Asp-Ile-Ala-X^u-Val activ&^site Asp region 
(residues 257-261), and also the Met- Phe-Cys- Ala -Gly oli- 
godeoxynucleotidc probe site (residues 336-340) and the 
Gly-Asp-Ser-Gly-Gly-Pro active-site Ser region • (residues. 
351-356), Furthermore, the relative positions of all of these 
conserved regions in hepsin were the same as they occur in 
other serine proteases. Although HUW1250 contained a 
poly(A) tail, it was apparent that it did not represent a full- 
length cDNA since the nucleotide sequence 5' to the sequence 
coding for the Ai^g-Ilo-Val-GIy-Gly activation site did hot qadc 
for a Met residue that could serve as a. site for initiation of 
translation. . V - . 

In order to isolate clones with larger cDNA inserts, ;ap;;i 
:proximate]y 960000 recombinants from a Hep G2 ccU hnC;; 
cDNA library (constructed in bacteriophage Agtl l).0*%P. 
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8 CCCTTTOCACGGACOCTA0CTGMX»3OCACAGGTCACOCACCCTGGCCTACCA^ 



IAOCATCCTOCTOOCCA 
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CGCCTGCAOACTGAcdcCACCCagCACTAOCTCGA OGC ICLKiC COC CA OCT GC T GO AOOCCAt^gTOOCA COCK^X OC^^ 



HAQKEGCB 
ATG CCC CAG AAC CAS OCT GCC OGG' 



X T P C C 8 
ACT GTC OCA TOC TGC TCC 



R 
ASA 
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AAO 



-4 A J. T A G T 

GCA OCT CTC ACT GOC OCXS ACC 



CTG CTA CTT CTO ACA 



Gob c& cioti b&k Vdd'Yoc^ aI? 



y ^ ^ ^ Rsdqeplypvovssao 
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1507 CTTXrrCGCrOCGCAGCCTOC A GGGCCCCAOGTOATCCCCCTCGTGGGATCCACGCTGCOCOGACGATGGGACCTTTTTCTTCTI 



1626 CACGGT< 



rOGCCCGCCCACTCAGCCCCCAGACCACCCAACCTCAOOCTOCTCACCCCCATCTAAATATTGI 



gTCCACAGCTCCAAGOACACO C TCOCTC 



rCGGACTCCTGTCTAGOTGCCCCTGA 



174 3 TGATGGGATG C TC 1 1 TAAATAAT AAAGATGGTTTTGATT-poIy C A ) 

FIGURE 2: Nucleotide sequence of the cDNA coding for human hcpsin. The sequence was determined by analysis of the cDNA inserts shown 
in Figure 1. The predicted amino acid sequence b shown above the DNA sequence. The solid, inverted triangle marks the locaUon of the 
msertcd sequence found in clones HcpG2UW]7 and HepG2UW2 (sec Figure 1). This sequence is not included in Hgurc 2. The boxed amino 
acid sequence represents a potential transmembrane domain. The solid arrow identifies an Arg-Ile bond that is probably cleaved when the 
inactive zymogen is converted to an active protease. The active-site His. Asp. and Scr residues are circled.' The underlined nucleotide sequence 
IS the site responsible for hybridizing to the synthetic oligodeoxynucleotide probe. 



screened by using the entire cDNA insert from HUW1250 
as a hybridization probe. Approximately 70 positive clones 
were identified in the initial screening, and most of these were 
plaque purified. Phage DNA was then prepared from 19 of 
these clones. 

Digestion of the recombinant phage DNAs with EcoRI 
released inserts that ranged in size from approximately 800 
to 1800 base pairs (bp). Two of these inserts (HepG2TJW7 
and HepG2UW20) were selected for further analysis. A 160 
bp £coRl-Mol fragment derived from the extreme 5' end of 
HcpG2UW7 was then employed as a hybridization piobe, and 
the original 70 positives were rescreened. Subsequently, five 
additional clones, designated HepG2UW2, HepG2UWl7, 
HepG2UW19, HepG2UW6l, and HepG2UW63. were also 
selected for DNA sequence analysis. A restriction enzyme 
map for the seven cDNA inserts obtained from the Hep G2 
library is shown in Figure 1. The strategy used to determine 
the cDNA sequence of hepsin from the various clones is also 
described in Figure 1. 

The complete nucleotide'sequ'enoe of the cDNA coding for * 
:hepsin is shown in Figure 2, along with the deduced amino 



acid sequence. The total length of the cDNA was 1783 bp. 
This is consistent with the size of the mRNA for hepsin present 
in Hep G2 cells as determined by Northern blot analysis (data 
not shown). The cDNA includes 245 nucleotides of un- 
translated sequence at the 5' end, 1251 nucleotides coding for 
a protein of 417 amino acids, a stop codon of TGA, and 284 
nucleotides of untranslated sequence at the 3' end. The ATG 
codon at positions 246-248 was assigned as that coding for 
the initiator Met since it is the most 5'-proximal codon 
specifying a Met after the stop codon of TGA at positions 
138-140. The Tirst ATG rule* reportedly holds for the vast 
majority of eucaryotic mRNAs (Kozak, 1984). The nucleotide 
sequence surrounding the tentative initiator Met codon is 
GA CATG G. This differs somewhat from the optimal se- 
quence of ACC ATG G for translation initiation sites proposed 
by Kozak (1986). A purine is present, however, in a critical 
position located three nucleotides upstream of the ATG oodon. 
The length of 5' untranslated regions in eucaryotic mRNAs 
can vary, with the majority (^^70%) being in the range of 
20-^0 nucleotides (Kozak, 1984). The 245 nucleotides up- 
stream from the apparent initiator Met represent a rather long 
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FIGURE 3: Hydropathy analysis of the deduced amino acid sequence 
of hepsin. The method of Kyte and DoolitUe (1982) was employed. 
using a window of 20 residues. The peak spanning residues 1 8-44 
represents the putative transmembrane domain. 

5' untranslated region for hepsin. Although the precise role 
of the 5' untranslated sequence in mRNAs has not been 
established, it has been suggested that secondary structurc(s) 
in long S' untranslated regions may be Involved in the regu- 
lation of transcription or . translation (Kozak. 1984). 

In contrast to nsost other serine proteases, the cDNA se- 
quence coding for hepsin did not predict the presence of a 
typical signal peptide. However, hydropathy analysis (Kyle 
& Doolittle, 1982) revealed the presence of a single, very 
hydrophobic domain of 27 residues near the amino terminus 
of the molecule (residues 18-44, Figure 3). This hydrophobic 
domain, starting 18 residues downstream from the apparent 
initiator Met, contains no charged amino acids and is suffi- 
ciently long and nonpolar to span a lipid bilayer. Furthermore, 
this potential membrane-spanning domain is flanked on either 
side by charged amino acids, which may serve to help anchor 
the protein in a membrane. 

From restriction enzyme mapping and DNA sequencing, 
it was found that clones HcpG2UW17 and HepG2UW2 had 
additional sequences near their 5' ends that were not present 
in the other cDNA inserts. Beginning at position 192 in the 
nucleotide sequence, clone HepG2UW17 contained an addi- 
tional 580 bp of DNA. This sequence was as follows: 
GTAAGGACAAGGGCCCCCAGACTCACAGTTCCA- 
GCCCTGAGGACAGGGGTTCCCTCATCCCCCCAC- 
CCAGCCTAATGCCCACCTCCTAATAGAGGGGTT- 
CCTGGGGACCTGAAGAGGGGGCACTATGACGT- 
CTCCCCAAGCACCTAGGTC3TTCTGTCCTGCTCT- 
TCCTTCAGACTCAGCCGTTGGACCCCAGTCCTTT- 
CCTCCCCAGACCCAGGAGTTCCAGCCCTCAGGC- 
CCCTCCTCCCTCATACTAGGGAGTCCTGGCCCO 
CAAATTCCTCCTTTCCCAAGACTTATGATTTCA- 
GGTCCTCAGCTGTCTCCTCCCTCAAACCGGGAT- 
CCrrCAGTCCCCTGCTCCACCAGGCTCAGGCATG- 
GGGGTCCCCATCCCTGCAAATCCAGGCGTCCCC- 
CCGCTGCTGGTCAGACACTGACCCCATCCTTGA- 
ACCCAGCCCAATCTGCGTCCGTGATCACGGCGT- 
GCTCTGGCCAAGGCCCAGTCCCTACAGCCTGCC- 
TGGATGGACGCCTGGGACTGGGGGCGCCAGGA- 
CTGGGCTGGGCTGGGCTCCCCCAGGCCCTGCCT- 
CCCCGTCCATCTC CTCACAG . Analysis of this sequence 
suggests that this insertion probably represents an unspliced 
intron or a remnant of an intron. The underlined hexa nu- 
cleotide sequences at the beginning and end of this sequence, 
GTAAGG and TCACAG, respectively, conform to consensus 
hescanucleotide sequences found at the 5' and 3' ends of introns 
adjacent to intron/exon splice junctions (Breathnach & 
Chambon, 1981; Ncvins. 1983), The GTAAGG donor site 
and the TCACAG acceptor site are probably used "for splic- 



ing-out this intronic sequence in the majority of thc inRNA ' 
molecules coding for hepsin. In the case of clon 
HepG2UW17, this sequence was not spliced-out when ih^ 
mRNA molecule that gave rise to this particular insert wa^ 
being processed. The additional sequence near the 5' end of 
clone HepG2U W2 is also probably due to improper splicing 
of the same intron. In this case, the cellular splidng apparuiii- 



apparently used the proper donor site (GTAAGG, underUned 
above), but an alternative acceptor site (ACCCAG, underlined' 
above). This removed most of the intronic sequence but left 
behind 145 nucleotides. With the exception of these two ^ 
probable splicing errors, no other differences were detected 
among the cDNA inserts in regions where overlapping sc-- " 
quences were obtained. 

At the 3' end of the cDNA. the sequence of AATAAA was 
present 14 nucleotides upstream from the polyadenylation site 
This sequence, which generally occurs 10-30 nucleotides up^ 
stream from the poly (A) tail, apparently functions as a signal 
for polyadenylation by cither specifying the proper cleavage 
site of mRNA transcripts or serving as a recognition sequence 
for poly(A) polymerase (Proudfoot & Brownlee, 1976: Nevins 
1983). 

The base composition of the cDNA coding for hepsin was 
particularly rich in G and C. The total nucleotide composition 
was calculated to be 17.0% A. 19.1% T, 31.2% G, and 32.5% 
C. The 245 bp 5' untranslated region contained an even higher 
content of C, and its base composition was calculated to be 
17.1% A, 12.6% T, 28.5% G, and 41.6% C. 

Besides, the open reading frame that codes for hepsin, an 
unusually long open reading frame was observed in the inverted 
sequence of this cDNA. This open reading frame spanned 
1353 nucleotides (nucleotides 105-1457 in the inverted se- 
quence).. The amino acid sequence deduced from this open 
reading frame was used in a search of the protein sequence 
database (National Biomedical Research Foundation, Wash- • 
ington, DC), but little signincant sequence identity was found 
with any other known protein. Furthermore, there were ho 
Met residues in the deduced amino acid sequence that could 
serve as a start site for translation. 

Discussion 

Analysis of the cDNA sequence presented for hepsin in- 
dicates that it codes for a protein that is a member of the serine 
protease family. The cDNA coding for hepsin was isolated 
from cDNA libraries prepared from human liver and Hep G2 
cell line mRNA. Preliminary data by Northern analysis in- 
dicate that the mRNA coding for hepsin is also expressed in 
a human osteosarcoma cell line. It is either not expressed or 
expressed only at very low levels in hum^n endothelial cells, 
smooth muscle cells, and skin fibroblasts, as determined by 
Northern analysis. 

The amino acid sequence of hepsin, deduced from the nu- 
cleotide sequence of its cDNA, is very similar to other serine 
proteases, especially in those regions that arc highly conserved 
among this group of enzymes. It contains His, Asp, and Set.. _i 
residues at positions 203, 257, and 353, respectively. These , 
amino acids are analogous to the His57, Aspioi* and Serjgi y 
residues in chymotrypsin that constitute the catalytic triad v: 
essential for enzymatic activity (Blow et al., 1969). . The . 
presence of an Asp (as opposed to a Ser) at position 347 V;^ 
suggests that hepsin possesses a substrate speciHcity similar;- j! 
to that of trypsin (Steitz et al., 1969; Hartley, 1970). Thwx^j 
residue is thought to contribute to substrate binding in thj?l;f 
active site of serine proteases and, for trypsin-like scrioe'^]' 
proteases, results in a preference for basic amino acids-t^Uhii 
The cDNA sequence predicts an Arg-Ile-Val-tely-Gly"aa ' 
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ncuRB 4: Comparison of the carboxyl-terminal end of the noncatalytic chain of hepsin with oorresponding regions in the noncaUlyttc chains 
of factor X (McMuUen et al., 1983), protein. C (Foster & Davie. 1984), factor VII (Hagcn et al., 1986), and factor IX (Kurachi & Davie, 
1982). Gaps have been inserted to bring the protein sequences into better alignment. The numbers in parentheses refer to the location of 
the sequence in that particular protein. Amino acids are boxed if they are found at the same location in hepsin and one or more of the other 
proteins. 



tivation site sequence (residues 162-166). This suggests that 
hepsin b synthesized as an inactive zymogen which is converted 
to an active serine protease by cleavage of the Argj^-Uci^ 
peptide bond. The resulting active serine protease would 
consist of two chains, including a noncatalytic chain (residues 
. 1—162) derived from the amino-tcrminal end of the zymogen 
and a catalytic chain (residues 163-417) derived from the 
carboxyl-terminal end. By analogy with the various plasma 
serine proteases, the Cys residues at positions 153 and 277 in 
the noncatalytic and catalytic chains, respectively, could be 
expected to form a disulfide bond that holds the two chains 
together. A computer search of the protein sequence database 
(National Biomedical Research Foundation, Washington, DC) 
showed that a portion of hepsin differs subsUntially from all 
serine proteases for which there is sequence data available. 
These data also showed that the noncatalytic chain is unique 
among known protein sequences except for its extreme car- 
boxyl-terminal region. This portion of the noncatalytic chain 
shares some sequence similarity with corresponding regions 
in four of the vitamin K dependent serine proteases (Figure 
4). Conversely, the cau lytic chain of hepsin exhibits a high 
degree of similarity with the catalytic chains of other serine 
proteases (Figure 5). 

When the primary structures of the catalytic chains of 
different serine proteases are compared, the pattern that 
emerges is one of small stretches of highly similar sequence 
occurring at various intervals along the polypeptide chain 
(Hartley & Shotton. 1971). Furthermore, internal residues 
are much more highly conserved than external ones* In their 
analysis of the catalytic chains of several serine proteases, Furie 
et al. (1982) identified seven conserved regions separated by 
six variable regions. The variable regions, which show little 
conservation of sequence, in addition to containing short de- 
letions and insertions, are thought to be located on the surface 
of the protein. This helps to explain why the internal structures 
and active sites of different serine proteases appear similar, 
whereas their surfaces, which play a major role in determining 
their unique substrate specificities, vary considerably. By 
comparing the amino acid sequence of the catalytic chain of 
hepsin with those of other serine proteases (Figure 5), it is 
apparent that h^in also follows the same pattern of conserved 
and variable regions. 

The highly basic sequence Arg-Arg-Lys (residues 1 55-1 57) 
just prior to the apparent activation site is similar to the basic 
sequences that also precede the activation sites in human factor 
X (Leytus et al.. 1984) and protein C (Foster & Davie, 1984). 
Factor X and protein C are synthesized as single-chain pro- 
cursors and arc converted to two-chain zymogens by cleavage 
and release of these basic residues. Subsequent cleavages at 
the activation sites for factor X and protein C release short 
activation peptides and result in the generation of an active 
serine protease. If the analogy is extended to include hepsin, 
it seems possible that this protein may also exist as a two^hain 



zymogen that releases a short peptide (e.g.. Leu-Pro-Val- 
Asp-Arg) upon its conversion to an active enzyme. 

Compared with other serine proteases, the number and 
positions of 9 out of the 10 cysteine residues in the catalytic 
chain of hepsin are highly conserved. On the basis of the 
known disulfide bridge arrangement in chymotrypsin (Kcil el 
aU 1963; Brown & HarUey, 1966), trypsin (Kauffman. 1965), 
prothrombin (Magnusson et al., 1975). plasmin (Sottrup- 
Jensen et al., 1978; Wiman, 1977), and factor X (Hojrup & 
Magnusson, 1987), and by analogy with other serine proteases, 
four intrachain disulfide bonds at cysteine pairs 188/204. 
291/359, 322/338, and 349/381 would be expected. In ad- 
dition. Cys277 is probably involved in a disulfide linkage with 
the noncatalytic chain. The remaining CysjTj has no analogous 
counterpart in other serine proteases. One possibility is that 
this extra Cys may participate in an interchain disulfide bridge 
between two monomers of hepsin. analogous to that proposed 
for factor XI (Fujikawa et al.. 1986). In the noncatalytic chain 
of hepsin, the cDNA sequence predicts' the presence of nine. .; 
Cys residues. Cys, 33 is probably involved In the disulfide 
linkage with the catalytic chain; This leaves an even number 
of Cys residues in the noncatalytic chain that could form 
intrachain disulfide bonds. 

From crystal lographic and kinetic studies of chymotrypsin 
and trypsin and from knowledge of their primary structures, 
it has been possible to identify residues in these enzymes that 
are involved in substrate binding and catalysis [reviewed in 
Birktoft et al. (1970), Hartley and Shotton (1971), and Kraut 
(1 977)]. Since some of these residues are essential for proper 
function, it was of interest to make a more detailed comparison 
with hepsin (Figure 5) and to determine whether hepsin 
possessed these same essential residues. 

(a) During the conversion of chymotrypsinogen to chymo- 
trypsin, the peptide backbone of segment 187-193 becomes 
more extended, resulting in the creation of a substrate binding 
pocket (Kraut, 1971). The peptide backbone of residues 
Seri89-Ser,9o-CyS|9i-Met,93 forms one side of this substrate 
binding pocket in chymotrypsin (Steitz et al.. 1969). This 
sequence is Aspi89-Ser,90'Cys,9,-Gln|g2 in trypsin and 
Aspis9-Ala|9o-CyS|9|-Glni92 in hepsin. 

(b) The opposite side of the substrate binding pocket in 
chymotrypsin is lined by residues Serju-Trpais-Glyjie. The 
peptide backbone of these residues is thought to interact with 
the side chains of the substrate for properly orienting the bond 
that is to be cleaved (Steitz et al., 1969). This stretch of amino 
acids is also present in hepsin. 

(c) Hydrogen bonding between Cysj9i/Aspi94 and 
^Pi94/Giyi97 provides a rigid structure in the peptide back- 
bone of chymotrypsin in the vicinity of the active site. This 
helps to hold the active-site Ser,95 io the proper orientation 
and is maintained only if Gly residues are present at positions 
193 and 196 (Birktoft et al.,"1970). -Hepsin also has Gly 
residues at these two positions. 
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of seven conserved regions (CRl-7) arc csscnUally the same as those designated by Furieet al. (1982). Since variable regions show minima) 
sequence oonservaUon, UtUe attempt was made to optimize the homology in these regions. Otherwise, gaps have been inserted to bring the 



swjuences mto belter alignment. Asterisks have been placed above the active-site residues His„, Aspi„. and Scr,„ that compose the catelytic 
tnad. An arrow mdicates the loMUon of the extra Cys residue in the sequence of hepsin. Residues are underlined when the wme amino acid 
IS found at the same posiUon in hcpsm. The percentage listed in parentheses at the end of each sequence represents the extent of similarity 
between hcpsm and that protem, as calculated from this alignment. 



(d) All acidic (Asp and Glu) and basic (Arg, Lys, and His) 
side chains are placed on the surface of cbymotrypsin, with 
the exception of Asptoi and Asp,94, which are buried in the 
interior of the molecule. In trypsin, there is an additional 
buried acidic side chain at Aspig,. Hepsin contains the two 



buried Asp residues ooounon.to both chymotrypdn and trypsin, - - 
nanaely, Aspioj and Asp, 94. In addition, at the position which ' 
has the greatest influence on substrate specificity (position 
189), hepsin .contains an Asp. residue. Thus, it is prcdirtccljv^!^ 
that hepsin would have a preference for substrates with basic *>'' - 
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side chains. It is of interest to note that Shotton and Watson 
(1970) made Xht prediction that a basic residue at position 
189 might result in a serine protease with a preference for 
acidic side sbains. 

♦ 

(e) In the three-dimensional model, for elastase, the side 
chains of Valjie Tfar226> replacing Glyji^ and Gly226 in 
chymotrypsin and trypsin, block the entrance of hydrophobic 
or charged substrates with bulky side chains from the binding 
pocket (Shotton & Hartley, 1970; Shotton & Watson, 1970). 
In hcpsin, the presence of Gly residues at positions 216 and 
226 is preserved. 

(0 The side chain of residue 192 has been described as being 
a flexible cover to the entrance of the substrate binding pocket 
in chymotrypsin (Steitz et al., 1969) and trypsin (Krieger et 
ah, 1974). In chymotrypsin, Met]92 may help provide a 
non polar environment for substrate side chains, whereas in 
trypsin Glnj^s may provide a more polar environment. In 
hepsin, position 192 is Gin. 

(e) The sequence Glyi4o-Trp,4i-Gly,42 is highly conserved 
in serine proteases and is presumed to be involved jn the ac- 
tivation process (Fehlhammer ct al., 1977). This sequence is 
also present in hepsin. 

The absence of a typical signal peptide and the presence of 
a potential transmembrane domain in hepsin are analogous 
to several other proteins recently described. Asialoglycoprotein 
receptor (Holland et al., 1984), transferrin receptor (Schneider 
el al., 1984), and plasma cell membrane glycoprotein PC-1 
(van Driel & Coding, 1987) arc examples of transmembrane 
proteins which lack a typical amino-terminal signal peptide 
that is cleaved during biosynthesis. These proteins possess 
hydrophobic domains near their amino termini which are 
thought to function as internal signal sequences. The hy- 
drophobic domains direct insertion of these proteins into the 
membrane of the endoplasmic reticulum, leaving the amino 
terminus facing the cytoplasm and the carboxyl terminus 
facing into the lumen of the endoplasmic reticulum (Holland 
& Drickamer, 1986; Zerial ct al., 1986; Wickner & Lodish, 
1985; Spiess & Lodish, 1986). If a protein with a mem- 
brane-spanning domain is ultimately destined for the plasma 
membrane, its orientation at the cell surface is determined by 
the mechanism by which it was inserted into the membrane 
of the endoplasmic reticulum. For the cases mentioned above, 
the amino terminus faces the cytoplasm, whereas the carboxyl ' 
terminus is extracellular. The lack of an amino-termtnal signal 
sequence and the presence of an internal hydrophobic domain 
in hepsin suggest that it is synthesized and integrated into 
membranes in a manner similar to the above-mentioned group 
of transmembrane proteins. If this were the case, then one 
would predict that the carboxyl-terminal caulytic chain of 
hepsin would be on the outside of the cell. There are many 
processes occurring extraccUularly near the cell surface that 
involve limited proteolysis. Although these have not yet been 
well characterized, an activatable, trypsin-Uke, transmembrane 
serine protease may' be an important participant in some of 
these processes. 

It is difficult to speculate as to the true physiological function 
of hepsin. Since it may be a membrane-associated protein, 
it probably is not participating in such processes as coagulation, 
fibrinolysis, complement activation, etc., unless it is also being, 
expressed by endothelial or blood cells. Since liver cells syn- 
thesize and secrete many different proteins, hepsin might be 
involved in the modification of other proteins as they axe being 
synthesized or secreted. This could include the removal of 
propeptides, from hormones, growth factors, or, the vitamin. K* 
dex^endeot proteases or the activation or inactivation of other 
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■ proteins. It is unclear, however, how hepsin is converted from 
a zymogen to an active enzyme and whether this involves 
another serine protease or whether hepsin is capable of au- 
toactivation, Answeis to these questions will require additional 
experimentation. 
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A m^or protease from human breast cancer cells was 
previously detected by gelatin zymography and pro- 
posed to play a role in breast cancer invasion and me- 
tastasis. To structurally characterize the enzyme, we 
isolated a cDNA encoding the protease. Analysis of the 
cDNA reveals three sequence motifs: a carboxyl-termi- 
nal region with similarity to the trypsin-like serine pro- 
teases, four tandem cysteine-rich repeats homologous to 
the low density lipoprotein receptor, and two copies of 
tandem repeats originally found in the complement sub- 
components Clr and Cls. By comparison with other ser- 
ine proteases, the active-site triad was identified as His- 
484, Asp-539, and Ser-633. The protease contains a 
characteristic Arg-Val-Val-Gly-Gly motif that may serve 
as a proteolytic activation site. The bottom of the sub- 
strate specificity pocket was identified to be Asp-627 by 
comparison with other trypsin-like serine proteases. In 
addition, this protease exhibits trypsin-like activity as 
defined by cleavage of synthetic substrates with Arg or 
Lys as the PI site. Thus, the protease is a mosaic protein 
with broad spectrum cleavage activity and two potential 
regulatory modules. Given its ability to degrade extra- 
cellular matrix and its trypsin-like activity, the name 
matriptase is proposed for the protease. 



Elevated proteolytic activity has been implicated in neoplas- 
tic progi'ession. Although the exact role(s) of proteolytic en- 
zymes in the progression of tumor remains unclear, it seems 
that proteases may be involved in almost every step of the 
development and spread of cancer. A widely proposed wiew is 
that proteases contribute to the degradation of extracellular 
matrix and to tissue remodeling and are necessary for cancer 
invasion and metastasis. A wide array of extracellular matrix- 
degrading proteases have been discovered, the expression of 
some of which correlates with tumor progression, as reviewed 
by Magnatti and Rifkin (1). The plasmin/urokinase-type plas- 
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minogen activator system and the 72-kDa gelatinase (MMP-2)/ 
membrane-type MMP system have received the most attention 
for their potential roles in the process of invasion of breast 
cancer and other carcinomas. However, both systems appear to 
be largely synthesized by stromal cells in vivo (2-5) and require 
indirect mechanisms for their recruitment and activation on 
the surfaces of cancer cells. The stromal origins of these well 
characterized extracellular matrix-degrading proteases may 
suggest that cancer invasion is an event that either depends 
entirely upon stromal-epithelial cooperation or is controlled by 
some other unknown epithelium-derived protease(s). A search 
for these epithelium-derived proteolytic systems that may in- 
teract with the plasmin/urokinase-type plasminogen activator 
system and/or with the MMP family could provide a missing 
link in our understanding of malignant invasion. 

We have pursued studies of a novel protease with the hy- 
pothesis that a tumor itself may be a major source of proteases 
important for multiple aspects of malignant behavior, includ- 
ing invasion and metastasis. To this end, we systematically 
altered several conditions such as the pH using gelatin zymog- 
raphy to search for potentially important breast cancer cell- 
derived gelatinases. This search led us to the discovery of a 
major protease, which on a gelatin zymogram had a slightly 
alkaline pH optimum and a size between those of MMP-2 and 
MMP-9 in T-47D human breast cancer cells (6). We now pro- 
pose to call this protease matriptase. Matriptase has been 
purified from T-47D cell-conditioned medium and has been 
used as an immunogen to produce monoclonal antibodies (7). 
Although matriptase was initially isolated from cell-condi- 
tioned medium, three lines of evidence, including immunoflu- 
orescence staining, surface biotinylation, and subcellular frac- 
tionation, suggested that a portion of the enzyme molecules 
were localized on the surfaces of cells. Given its extracellular 
matrix-degrading activity and presentation on the surfaces of 
breast cancer cells, we hypothesize that matriptase may be 
involved in breast cancer invasion. To further characterize the 
newly discovered matrix-degrading protease in this study, we 
have purified the enzyme and its binding protein from human 
milk, a biological source of relatively high abundance. A cDNA 
clone for matriptase has now been generated and characterized. 

MATERIALS AND METHODS 

Cell Lines and Culture Conditions — COS-7 cells were maintained in 
modified Iscove*s minimal essential medium (Biofluids, Inc., Rockville, 
MD) supplemented with 5% fetal calf serum (Life Technologies, Inc.), 

Purification of Matriptase — To obtain enough matriptase for amino 
acid sequencing, the enzyme was isolated from human milk (39). 
Briefly, human milk from the Georgetown University Medical Center 
Milk Bank was precipitated and collected by addition of ammonium 
sulfate between 40 and 60% saturation. Matriptase was purified by a 
combination of CM-Sepharose and immunoaffinity chromatography. 

Amino Acid Sequence Analysis — To obtain internal amino acid se- 
quences, purified matriptase was separated by SDS-polyacrylamide gel 
electrophoresis and lightly stained with Coomassie Blue, and protein 
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Fig. 1. Purification of matriptase in its 95-kDa complexed 
form from human milk. The partially purified 95-kDa matriptase 
complex from ion-exchange chromatography was loaded onto a mAb 
21-9-Sepharose column. The bound proteins were eluted by glycine 
buffer, pH 2.4, and neutrahzed by addition of 2 m Trizma. The eluted 
proteins were incubated in ix SDS sample buffer in the absence of 
reducing agents at room temperature {lanes 1\ —Boil) or at 95 {lanes 
2\ +Boil) for 5 min. The samples were resolved by SDS -polyacryl amide 
gel electrophoresis and either stained by colloidal Coomassie (A) or 
subjected to immunoblot analysis using mAb 21-9 iB) or gelatin Kymog- 
raphy (C). The 95-kDa matriptase complex was eluted from this affmity 
column as the major protein (Ay lane 1); it was recognized by mAb 21-9 
(B, lane 1); and it also exhibited gelatinolytic activity (C, lane 2). The 
95-kDa matriptase complex was converted to matriptase by boiling (A, 
lane 2). The gelatinolytic activity of the 95-kDa protease was destroyed 
by boiling, but a low level of the gelatinolytic activity was survived and 
converted to matriptase (C, lane 2), A low level of uncomplexed 
matriptase was copurified with the 95-kDa matriptase complex by 
afiinity chromatography (A, lane 1)\ it also exhibited gelatinolytic ac- 
tivity CC, lane 1). Immunoblot analysis enhanced the signal of the 
uncomplexed matriptase and reconfirmed its existence (B, lane 1). 
Several other polypeptides were also seen (A, lanes 1 and 2). Some of 
them could be the degraded products of the protease since they were 
recognized by mAb 21-9 after longer exposure to the x-ray film. A 
40-kDa protein doublet was seen in low levels in a nonboiled sample (A, 
lane i), but its levels were increased after boiling (A, lane 2), This 
40-kDa doublet was not recognized by mAb 21-9 (S). We propose that 
these two polypeptides could be binding proteins (BPs) of matriptase. 
The sizes of the molecular mass markers are indicated. 



bands were excised. Matriptase was then subjected to in-gel digestion 
and amino acid sequencing at the Howard Hughes Medical Institute 
Biopotymer Laboratory and W. M. Keck Foundation Biotechnology Re- 
source Laboratory at Yale University. The amino-terminal sequences 
were determined as described previously (8). Briefly, the proteins were 
resolved by SDS-polyacrylamide gel electrophoresis, transferred to 
polyvinylidene difluoride membrane, and lightly stained with Coomas- 
sie Blue. The proteins were then excised and subjected to amino-termi- 
nal sequencing in the Chemistry Department of Florida State Univer- 
sity (Tallahassee, FL). The two short sequences obtained were identical 
to a deduced amino acid sequence from a cDNA termed SNC19 (Gen- 
Bank''" accession number U20428). 

Amplification of an SNC19 cDNA from T'47D Breast Cancer 
Cells — An SNC19 cDNA clone was generated by reverse transcriptase- 
polymerase chain reaction utilizing mRNA fi*om T-47D human breast 
cancer cells. Primer sequences for SNC19 (5'-CCTCCTCTTGGTCTT- 
GCTGGGG-3' and 5'-AGACCCGTCTGmTCCAGG-3') were derived 
fi*om the published sequence. Standard reverse transcription -polymer- 
ase chain reaction was conducted using the Advantage RT-PCR kit 
(CLONTECH). Products were analyzed on a 0.8% agarose gel; and the 
resultant band of ^2.8 kilobase pairs, corresponding to the expected 
product size, was excised from the gel, purified, and ligated into pCR2.1 
(Invitrogen, San Diego, OA) by TA cloning (pCR-SNC19). 

Sequencing — DNA sequencing was performed on an Applied Biosys- 




3\ 



Fig. 2. Western blot analysis of SNC19 protein expressed in 
COS cells using anti-matriptase mAb IVI32. The SNC19 fragment 
generated by reverse transcriptase-polymerase chain reaction was in- 
serted into the expression vector pcDNA3.1 and transfected into COS-7 
cells. Cell lysates from SNC19-transfected COS-7 cells {lane 1) and 
control COS-7 cells (lane 2) and the conditioned medium of T-47D 
human breast cancer cells (lane 3) were subjected to Western blot 
analysis using anti-matriptase mAb M32. 



tems automated 377 DNA sequencer using standard methods, with the 
assistance of the Lombardi Cancer Center Sequencing and Synthesis 
Shared Resource. The sequences were assembled and analyzed with 
Lascrgene software for Windows CDNASTAR, Inc., Madison. WI). The 
predicted protein sequence was compared with sequences in the Swiss- 
Prot data base at the National Center for Biotechnology Information 
using the BLAST network server. 

Expression ofSNC19 in COS-? Cells— To verify that SNC19 encodes 
the matriptase cDNA, we constructed a eukaryotic expression vector 
(pcDNA/SNC19) utilizing the commercially available pcDNA3.1 vector 
(Invitrogen, San Diego, CA). A 2.83-kilobase pair EcoRl firagment con- 
taining the SNC19 cDNA was produced by digestion of pCR-SCN19 and 
cloned into the BcoBJ site of pcDNAS.l. This construct contains the 
open reading frame of SNC19 driven by the cytomegalovirus promoter. 
CoiTCct insertion of the SNC19 cDNA was verified by restriction map- 
ping (data not shown). Transfections were carried out using SuperFect 
transfection reagent (QIAGEN Inc., Valencia, CA) as specified in the 
manufacturer's handbook. After 48 h, the matriptase -transfected 
COS-7 cells and the control COS-7 cells, which were transfected with 
LacZ to monitor transfection efficiency, were extracted with 1% Triton 
X-100 in 20 mM Tris-HCl, pH 7.4. 

Immunoblot Analysis — Immunoblotting was conducted as described 
previously (7), Proteins were separated by 10% SDS-polyacrylamide gel 
electrophoresis, transferred to polyvinylidene fluoride membrane, and 
subsequently probed with anti-matriptase mAb^ M32. Immunoreactive 
pol3^eptides were visualized using peroxidase-labeled secondary anti- 
serum and the ECL detection system (Amersham Pharmacia Biotech). 

Gelatin Zymography — Gelatin zymography was carried out as de- 
scribed previously with some modifications (13). Gelatin (1 mg/ml) as a 
substrate was copolymerized with regular SDS-polyacrylamide gel. 
Electrophoresis was performed at a constant current of 15 mA. The 
gelatin gels were washed three times with phosphate-buffered saline 
containing 2% Triton X-100 and incubated in phosphate-buffered saline 
at 37 ''C overnight. 

Cleavage of Synthetic Substrates— To demonstrate the trypsin-like 
activity of matriptase. various synthetic fiuorescent protease substrates 
with arginino or lysine as the PI site were tested with purified 
matriptase from human milk. Matriptase was assayed in 20 mM Tris 
buffer, pH 8.5, at 25 *C in a volume of 190 ft\ prior to addition of 10 /il 
of 2 mM substrate solution (to a final concentration of 0.1 mM). These 
substrates included ^butyloxycarbonyl (Boc)-Gln-Ala-Arg-7-amino-4- 
methylcoumarin (AMC), Boc-benzyl-Glu-Gly-Arg-AMC, Boc-Leu-Gly- 
Arg-AMC, Boc-benzyl -Asp-Pro- Arg- AMC, Boc-Phe-Ser-Arg-AMC, Boc- 
Val-Pro-Arg-AMC, succinyl-Ala-Phe-Lys-AMC, Boc-Leu-Arg-Arg-AMC, 
Boc-Gly-Lys-Arg-AMC, and Boc-Leu-Ser-Thr-Arg-AMC. These sub- 

^ The abbreviations used are: mAb, monoclonal antibody; Boc, t- 
butyloxycarbonyl; AMC, 7-amino-4-methylcoumarin; LDL, low density 
lipoprotein. 
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Fig. 3. Nucleotide and deduced 
amino acid sequences of a matriptase 
cDNA clone. The primers (20 bases at 
the 5 '-end and 18 bases at the 3 '-end) 
used for reverse transcriptase-polymer- 
ase chain reaction are underlined. Thirty- 
three bases beyond the 5 '-end primer and 
92 bases beyond the 3 '-end primer were 
taken from SNC19 cDNA and incorpo- 
rated. The cDNA sequence was translated 
from the fifth ATG codon in the open 
reading frame. Nucleotide and amino acid 
numbers are shown on the left. Sequences 
that agreed with the internal sequences 
obtained from matriptase are double-un- 
derlined. His-484, Asp-539, and Ser-633 
are boxed and indicated the putative cat- 
alytic triad of matriptase. Potential N- 
glycosylation sites are indicated (A). An 
RGD sequence is indicated (♦). 
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CGCTGGGTGGTGCTGGCAGCCGIGCTGATCGGCCTrrTmnm rTTnrTnnnn ATrRKrTTrrTrnTKTrnrATiTrrArTi^rrrr 

GACGTGCGTGTCCACAAGGTCTTCAAICGCTACATCAGGATCACAAATGAGAATTTTCTGGArGCCTACGAGAACTCCAACTCCACTGAG 
TTTGTAAGCCTGGCCAGCAAGGTCAAGCACSCGCTGAAGCTGCTGTACAGCSGAGTCCCATTCCTGGGCCCCrACCACAAGGAGTCGGCT 
CTGACGGCCTTCAGCGAGGGCAGCGTCATCGCCTACTACTGGTCTGAGTTCAGCATCCCGCAGCACCTGGTGGAGGAGGCCGAGCGCGTC 
ATGGCCGAGGAGCGCGTAGTCATGCTGCCCCCGCCGGCGCGCTCCCTGAAGTCCTTTGTGGTCACCTCAGTGGTGGCTTTCCCCACGGAC 
MAEERVVMLPPRARSLKSFVVTSVVAFPTO 

TCCAAAACAGTACAGAGGACCCAGGACAACAGCTGCAGCTTTGGCCTGCACGCCCGCGGTGTGGAGCTGATGCGCTTCACCACGCCCGGC 
SKTVORTQDNSCSFGLHARGVELMRFTTPG 

TTCCCTGACAGCCCCTACCCCGCTCATGCCCGCTGCCAGTGGGCCCTGCGGGGGGACGCCGACTCAGTGCTGAGCCTCACCTTCCGCAGC 
FPDSPYPAHARCGWALRGDADSVLSLTFRS 

« « * 

TTTGACCTTGCGTCCTGCGACGAGCGCGGCAGCGACCTGGTGACGGTGTACAACACCCTGAGCCCCATGGAGCCCCACGCCCTGGTGCAG 
FOLASCDERGSDLVTVYNTLSPMEPHALVQ 

TTGTGTGGCACCTACCCTCCCTCCTACAACCTGACCTTCCACTCCTCCCAGAACGTCCTGCTCATCACACTGATAACCAACACTGACCGG 
LCGTYPPSYNLTFHSSONVLL ITLl INTER 

A 

CGGCATCCCGGCTTTGAGGCCACCTTCTTCCAGCTGCCTAGGATGAGCAGCTGTCGAGGCCGCTTACGTAAAGCCCAGGGGACATTCAAC 
RHPGFEATFFOLPRMSSCGGRLRKAOGTFN 

AGCCCCTACTACCCAGGCCACTACCCACCCAACATTGACTGCACATGGAACAI I GAGGTGCCCAACAACC AGCATGTGAAGGTGCGCTTC 
SPYYPGHYPPNIDCTWNIEVPNNOHVKVRF 

AAA I ICI ICrACCTGCTGGAGCCCGGCGTGCCTGCGGGCACCTGCCCCAAGGACTACGTGGAGATCAATCGGGAGAAATACTGCGGAGAG 
KFFYLLEPGVPACTCPK DYVC INGEK Y C G E 

AGGTCCCAGTTCGTCGTCACCAGCAACAGCAACAAGATCACAGTTCGCTTCCACTCAGATCAGTCCTACACCGACACCGGCTTCTTAGCT 
RSOFVVTSNSNK I TVRFHSOQSYTOTGFLA 

GAATACCTCTCCTACGACTCCAGTGACCCATGCCCGGGGCAGTTCACGTGCCGCACGGGGCGGTGTATCCGGAAGGAGCTGCGCTGTGAT 
EYLS YOSSDPCPGOFTCRTGRCIRKELRCD 

GGCTGGGCCGACTGCACCGACCACACCGATGAGCTCAACTGCAGTTGCGACGCCGGCCACCAGTTCACGTGCAA6AACAAGTTCTGCAAG 
GWADCTDHSDELNCSCDAGHOFTCKNKFCtC 

A 

CCCCTCTTCTGGGTCTGCGACAGTGTGAACGACTGCGGAGACAACAGCGACGAGCAGGGGTGCAGTTGTCCGGCCCAGACCTTCAGGTGT 
PLFWVCDSVKDCGDNSDEOGCSCPAOTFRC 

TCCAATGGGAAGTGCCTCTCGAAAAGCCAGCAGTGCAATGGGAAGGACGACTGTGGGGACGGGTCCGACGAGGCCTCCTGCCCCAAGGTG 
SNGkCLSKSOOCNGKDOCGDGSDEASCPKV 

AACGTCGTCACTTGTACCAAACACACCTACCGCTGCCTCAATGGGCTCTGCTTGAGCAAGGGCAACCCTGAGTGTGACGGGAAGGAGGAC 
NVVTCTKHTYRCLNGLCLSKGNPECDCKED 

TGTAGCGACGGCTCAGATGAGAAGGACTGCGACTGTGGGCTGCGGTCATTCACGAGACAGGCTCGTGTTGTTGGGGGCACGGATGCCCAT 
CSOGSDEKDCDCGLRSFTROAR VVGGTDAD 

GAGGGCGAGTGGCCCTGGCAGGTAAGCCTGCATGCTCTCGGCCAGGGCCACATCTGCGGTGCTTCCCTCATCTCTCCCAACTGGCTGGTC 
E G E WPWOVSLHALGOGH I CGASL 1 SPNWLV 

TCTGCCGCACACTGCTACATCGATGACAGAGGATTCAGGTACTCAGACCCCACGCAGTGGACGGCCTTCCTGGGCTTGCACGACCAGAGC 
saa[h]cy IDDRGFRYSDPTOWTAFLGLHDOS 

cagcgcagcgcccctggggtgcaggagcgcaggctcaagcgcatcatctcccaccccttcttcaatgacttcaccttcgactatgacatc 
orsapgvqerrlkri ishpffnoftfdy[o]i 

gcgctgctggagctggagaaaccggcagagtacagciccatggtgcggcccatctgcctgccggacgcctcccatgtcttccctgccggc 

ALLELEKPAEYSSHVRP 1 CLPDASHVFPAG 

AAGGCCATCTGGGTCACGGGCTGGGGACACACCCAGTATGGAGGCACTGGCGCGCTGA7CCTGCAAAAGGGTGAGATCCGCGTCATCAAC 
KAIWV TGWGHTOYGGTGAL ILOKGE IRV IN 

CAGACCACCTGCGAGAACCTCCTGCCCCAGCAGATCACGCCGCGCATGATGTGCGTGGGCTTCCTCAGCGGCGGCGTGGACTCCTGCCAG 
OTTCENLLPOOI TPRflMCVGFLSGGVDSCO 

GGTGATTCCGGGGGACCCCTGTCCAGCGTGGAGGCGGATGGGCGGATCTTCCAGGCCGGTGTGGTGAGCTGGGGAGACGGCTGCGCTCAG 
GOUIJgGPLSSVEAOGR i FOAGVVSWGDGCAQ 

AGGAACAAGCCAGGCGTGTACACAAGGCTCCCTCTGTTTCGGGACTGGATCAAAGAGAACACTGGGGTATAGGGGCCGGGGCCACCCAAA 
RNKPGVYTRLPLFROWIKENTGV'" 

TGTGTACACCTGCGGGGCCACCCATCGTCCACCCCAGTGTGCACGCCTGCAGGCTGGAGACTGGACCGCTGALIGCACCAGCGCCCCCAG 
AACATACACTGTGAACTCAATCTCCAGGGCTCCAAATCTGCCTAGAAAACCTCTCGCTTCCTCAGCCTCCAAAGTGGAGCTGGGAGGTAG 
AAGGGGAGGACAC7GGTGGTTCTACTGACCCAACTGGGGGCAAAGGTTTGAAGACACAGCCTCCCCCGCCAGCCCCAAGCTGGGCCGAGG 
CGCGTTTGTGTATATCTGCCTCCCCTGTCTGTAAGGAGCAGCGGGAACGGAGCTTCGGAGCCTCCTCAGTGAAGGTGGTGGGGCTGCCGG 
*TCTCCCrTCTC??'^C'"^'^TrnrrrArr'rTrTTrArrAAnrrrAnnrTrf:r:Af:nArrrTrnAA&ArAnArnr:nTrTnAnArTnAAAATGr, 
TTTACCAGCTCCCAGGTGACTTCAGTCrGTGTATTGTGTAAATGAGTAAAACATTTTATTTCTTTTTAAAAAAAAAAA 



strates were purchased from Sigma. The rate of cleavage of individual 
substrate was determined against time with a Hitachi F-4500 fluores- 
cence spectrophotometer. 

RESULTS AND DISCUSSION 

Purification of Matriptase from Human Milk — In our previ- 
ous study (7), a small proportion of the matriptase molecules 
were identified as complexes in human breast cancer cells. We 
have subsequently found human milk to be a good source for 
isolation of larger quantities of the matriptase complexes (39). 
We first purified from human milk a matriptase complex with 
an apparent size of 95 kDa using anti-matriptase nxAb 21-9- 
Sepharose affinity chromatography (Fig. lA). The 95-kDa com- 
plex is capable of being converted by boiling to matriptase plus 
a 40-kDa protein doublet. Both the 95-kDa complex and 



matriptase itself were recognized by anti-matriptase raAb 21-9 
(Fig. IB). Although sequence analysis of the 40-kDa binding 
protein has shown it to be a serine protease inhibitor (see 
below), some residual gelatinol3^ic activity was observed for 
the 95-kDa matriptase-inhibitor complex (Fig. IC). When 
matriptase and its binding protein were subjected to N-termi- 
nal sequencing, only 11 amino acid residues (WGGT- 
DADEGE) from matriptase were obtained, with relatively low 
recovery. In addition, 12 amino acid residues (GPPPAPPGL- 
PAG) were obtained from the amino terminus of the 40-kDa 
binding protein. We searched GenBank™ using these amino 
acid sequences for proteins related or corresponding to 
matriptase and its binding protein. The binding protein of 
matriptase was identified to be a Kunitz-type serine protease 
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Fig. 4. Comparison of the amino acid sequence of the C-terminal region of matriptase with trypsin, chymotrypsin, and the 
catalytic domains of other serine proteases. The C-terminal region (amino acids 431-683) of matriptase is compared with human trypsin (21); 
human chymotrypsin (22); the catalytic chains of human enteropeptidase (16), human hepsin (17), human blood coagulation factor XI (19), and 
human plasminogen; and the serine protease domains of two transmembrane serine proteases, human TMPRSS2 (32) and the Drosophila 
Stubble-stubbloid gene (Sb-sbd) (33). Gaps to maximize homologies are indicated by dashes. Residues in the catalytic triads (matriptase His-484, 
Asp-539, and Ser-633) are boxed and indicated (A). The conserved activation motif ((R/K)VIGG) is boxed, and the proteolytic activation site is 
indicated. Eight conserved cysteines needed to form four intramolecular disulfide bonds are boxed, and the likely pairings are as follows: 
Cys-469-Cys-485, Cys-604-Cys-618, Cys-629-Cys-658, and Cys-432-Cys-559. The disulfide bond Cys-432-Cys-559 is observed in two-chain 
serine proteases, but not in trypsin and chymotrypsin. Residues in the substrate pocket (Asp-627, Gly-655, and Gly-665) are boxed and indicated 
(*). It is evident that the residue positioned at the bottom of the substrate pocket is Asp in trypsin-like proteases, including matriptase, but Ser 
in chymotrypsin. 



inhibitor. This inhibitor is known to be a reversible and com- 
petitive serine protease inhibitor that was reported to inhibit 
the hepatocyte growth factor activator; thus, it was named HAI 
(9). The detailed characterization of HAI from the matriptase 
complex is reported in the accompanying paper (39). The 11 
amino acid residues from matriptase were identical to a de- 
duced amino acid sequence from a 2.9-kilobase pair cDNA 
called SNC19. We subsequently obtained nine internal amino 
acid residues (DYVEINGEK) from matriptase. These were also 
identical to the predicted translated protein sequences of 
SNC19. However, numerous stop codons were observed in this 
deposited SNC19 sequence, resulting in several small predicted 
translation products. Thus, a 2830-base pair cDNA fragment 
was obtained by reverse transcriptase-polymerase chain reac- 
tion using two primers based on the sequence of SNC19. We 
observed extensive discrepancy (132 bases) between our se- 
quence and that of SNC19. These analyses suggest that there 
might be some errors in the bank-deposited SNC19 sequences 
or that this cDNA encodes a distinct but related protein(s). 
Verification of SNC19 cDNA Encoding Matriptase — In addi- 



tion to the sequence identity of matriptase to a portion of 
SNC19, we examined the immunoreactivity of anti-matriptase 
mAbs to the SNC19 to verify whether SNC19 encodes 
matriptase. SNC19 cDNA was inserted into the eukaryotic 
expression vector pcDNAS.l and transfected into COS-7 mon- 
key kidney fibroblasts, which do not express matriptase. An 
immunoreactive band with the same size of matriptase from 
T-47D human breast cancer cells (Fig. 2, lane 3) was detected 
by anti-raatriptase mAb M32 in SNC 19- transfected COS-7 cells 
{lane i), but not in control COS-7 cells {lane 2). These results, 
when combined with the internal amino acid sequences from 
matriptase demonstrating identity to the deduced amino acid 
sequences of SNC19, suggest that SNC19 encodes matriptase. 

Nucleotide and Predicted Amino Acid Sequences of a 
Matriptase cDNA Clone — The nucleotide and amino acid se- 
quences of SNC 19 are shown in Fig. 3. Matriptase cDNA is 
likely to be 2955 base pairs long when the 5 '-end 33 bases and 
the 3 '-end 92 bases from SNC 19 are added to the reverse 
transcriptase-polymerase chain reaction fragment (2830 base 
pairs long). The translation initiation site was assigned to the 
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Fig. 5. Alignment of partial se> 
quences of the noncatalytic domain 
with those of homologous regions in 
other proteins. A, the cysteine-rich re- 
peats of matriptase (amino acids 280- 
314, 315-351, 352-387, and 394-430) are 
compared with the consensus sequences 
of the human LDL receptor (23), LDL re- 
ceptor-related protein (LRP) (24), human 
perlecan (34), and rat GP-300 (35). The 
consensus sequences are boxed. B, Clr/s- 
type sequences of matriptase (Mt\ amino 
acids 42—155 and 168-268) are compared 
with selected domains of human comple- 
ment subcomponent Clr (amino acids 
193-298) (25, 26), Cls (amino acids 175- 
283) (27, 28), Ra-reactive factor {RaRF) 
(amino acids 185-290) (36, 37), and a cal- 
ciumdependent serine protease (CSP) 
(amino acids 181-289) (38). The consen- 
sus sequences are boxed. 



Matriptase {280-31^1) 
{315-351) 
{352-3871 

Consensus sequences 
LDL - receptor 
LRP 

Per I ecan 
GP-330 



Q Clr/s type region 



P 

s 
s 

T 



PG - - 0 
DAGHO 
PA-QT 
TK - H T 



TfclR TGRfC 



f 

FfTiC 

EJRC 

YRC 



EF 



EF 



TG 
KNKF 
5NGK 
LNG 



L C 



RC 



I RK ELR - 
ClKPLFWV - 
L SK SQ - 
LSKGNP 




I • 

I P 

■ • 

I • 



W • - 
W • - 



COIGW 

cgs VN 

NGKO 

cdIgke 



AD 



CD 
CD 
CD 
CD 



C 
DC 
D 
DC 



T D 
G D 
dG D 



DC 
DC 
DC 
DC 



H 
N 
G 
SlDG 



SDE 
SDE 
SDE 
SDE 



SDE 
SDE 
SDE 
SDE 



L^ C 
OG C 
ASC 
KD C 



Mt t 1) 
Mt (2) 
Clr (2) 
Cls (2) 
RaRF (2) 
CSP (21 



Mt ( 1) 
Mt (2) 
Clr (21 
Cls (2) 
RaRF (2) 
CSP (21 



42 
168 
193 
175 
185 
181 



107 
226 
251 
235 
2^13 
241 




GFinDSP|7P|AHAR|C 
-h YP^NI 



YPG 



DC 



ClFGLHARGVELMRFT 
C|GGRLRKAO-|aT--FI^ 
C3SELYTEASGY-- I 
CSGDVFTALIG£--1A|S 
C S DNL F T OR T G V - - I T SF=|DF|FiN - p|YPk SSEt 
CaGDVFTALIEE-- lAEBN 



-syp^dlrc^ysi 
y]f!k-p|yp|ensrIc :yqi 

-YTI 



QWALRGDADSVLSLTFRS--|FDiLASCDERGSOLtfT 
TWh(llEVPNNOHVKVRF-KFFYL ™ 

rvergltlhlkfl-epfT 

RLEKCFOVVVTLRREOFI 
ELEEGFMVNLQFE-DI 



CVEAADSAGN 

FD lED-HPEVP C 

PlYPlENSRlClEYOlllRLEKGFQVVVTLRREaFnlvFAADSAGN 



LEPGVPAGT--- 
IDD-HOOVH--- 



VYNTLS-PMEPHALVOLCqTYFlFlS 

PrfDlYVE INGEK YC( 

PYDQLOI YANGKNIGEFCI 
L-DSLVFVAGDRQFGPYCI 
PYDYIK IKVGPKVLGPFC 
Q -OSL L F A AK NRQF G P FOSNG FlSG 



Yl 
ER 
K( 
Gl 

EK- 



GH 

5' 



QRFP 
FFG 
APE 



YNLTFHSS 
S-QFVVTSNiS 
-DLD--TSS 
-PLNIETK 
-PIS--T 
-PLTIETli 



LLITLITNTERRHPfaP 155 
ITVF^HSfaOSYTDTGF 268 
VDLLFFTCESGDSRGW 298 
LDi IFOTDLTGQKKGW 283 
HSVLILFHSDNSGENRGW 290 
NhLDiVEQ-naLTEOKKlGlw 289 



r^ttative 
Signal peptide 



NH2 



LDL receptor domain 
\ I M 111 IV / 









Serine protease domain 



COOH 



fl 

C 1 r/s domain 

Fig. 6. Domain structure of matriptase. A schematic representa- 
tion of the structure of matriptase is presented. The protease consists of 
683 amino acids, and the protein product has a calculated mass of 
75,626 Da. The protease contains two tandem complement subcompo- 
nent Clr and Cls domains and four tandem LDL receptor domains. The 
serine protease domain is at the carboxyl terminus. 



fifth methionine codon because the sequence GTCATGG matches 
a favorable Kozak consensus sequence (10). This methionine is 
followed by four positively charged amino acids and a 14-amino 
acid hydrophobic region (Ser-18-Ser-31), a putative signal pep- 
tide. Assuming this methionine codon to be the initiator, the open 
reading frame was 2049 base pairs long, and thus, the deduced 
amino acid sequence was composed of 683 residues with a calcu- 
lated molecular mass of 75,626 Da. The two stretches of amino 
acid sequences (DYVEINGEK and WGGTDADEGE) obtained 
from matriptase are located in amino acids 228—236 and 443- 
453; thus, the translation frame is likely to be correct. There are 
three potential N-glycosylation sites with the canonical Asn-X- 
(Ser/Thr) sequence and an RGD sequence. An RGD sequence 
from proteins of the extracellular matrix has been found to me- 
diate their interactions with integrins (11), 

Structure of the Matriptase Catalytic Domain — A homology 
search for the deduced amino acid sequence by BLAST in the 
Swiss-Prot data base revealed that the carboxyl terminus at 
residues 432—683 of matriptase is homologous to other serine 
proteases and that matriptase contains the invariant catalytic 
triad, a characteristic disulfide bond pattern, and overall se- 
quence similarity. Compared with the archetype serine prote- 
ase chymotrypsin (12, 13) and other serine proteases, the three 
amino acids (His-484, Asp-539, and Ser-633) are likely to cor- 
respond to those in chymotrypsinogen (His-57, Asp-102, and 
Ser-195) and are likely to be essential for catalytic activity (14). 
The six most conserved cysteines needed to form three intramo- 



lecular disulfide bonds that stabilize the catalytic pocket have 
been determined in other chymotrypsin-related proteases. The 
most likely cysteine pairings in matriptase are thus as follows: 
Cys-469-Cys-485, Cys-604-Cys-618, and Cys-629-Cys-658). 
Matriptase also contains two additional cysteines (Cys-432- 
Cys-559) that correspond to those used in two-chain proteases, 
such as enteropeptidase (15, 16), hepsin (17), plasma kallikrein 
(18), blood coagulation factor XI (19), and plasminogen (20), but 
not in trypsin (21) or chymotrypsin (22) (Fig. 4). 

A putative proteolytic activation site (Arg-442) of matriptase 
in an Arg-Val-Val-Gly-Gly motif is similar to the characteristic 
RIVGG motif in other serine proteases. As mentioned above, a 
conserved intramolecular disulfide bond is found in those ser- 
ine proteases that are synthesized as single-chain zymogens 
and are proteoljrtically activated to become active two-chain 
forms. This disulfide bond is proposed to hold together the 
active catalytic fragment with their noncatalytic N-terminal 
fragments. This conserved intramolecular disulfide bond has 
been also observed in matriptase (Cys-432-Cys-559). These 
sequence analyses suggest that matriptase may be synthesized 
as a single-chain zymogen and may become proteolytically ac- 
tivated to a two-chain form. If this is the case, the majority of 
matriptase molecules in the conditioned medium of T-47D 
breast cancer cells are likely to be in the zymogen form; the 
two-chain matriptase represents only a minor proportion of the 
total, consistent with the purified matriptase from T-47D hu- 
man breast cancer cells exhibiting an apparent size of 80 kDa 
under reduced conditions (data not shown). This conclusion is 
also supported by the observation that the proposed N-terminal 
sequences for the catalytic chain of matriptase are identical to 
the stretch of amino acid residues (WGGTDADEGE) that were 
obtained from milk-derived matriptase with very low recovery 
when matriptase was subjected to N-terminal sequencing. 

The substrate specificity (Sj) pocket of matriptase is likely to 
be composed of Asp-627, positioned at its bottom, with Gly-655 
and Gly-665 at its neck, indicating that matriptase is a typical 
trypsin-like serine protease. The predicted preferential cleav- 
age for matriptase at amino acid residues with positively 
charged side chains was tested with 10 synthetic substrates 
with Arg and Lys residues as PI sites. In our preliminary 
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studies (data not shown), matriptase was able to cleave the 
following synthetic substrates, presented as follows from the 
most rapid to the slowest: Boc-Gln-Ala-Arg-AMC, Boc-benzyl- 
Glu-Gly-Arg-AMC, Boc-Leu-Gly-Arg-AMC, Boc-benzyl-Asp- 
Pro-Arg-AMC, Boc-Phe-Ser-Arg-AMC, Boc-Val-Pro-Arg-AMC, 
succinyl-Ala-Phe-Lys-AMC, Boc-Leu-Arg-Arg-AMC, Boc-Gly- 
Lys-Arg-AMC, and Boc-Leu-Ser-Thr-Arg-AMC. Thus, matrip- 
tase may prefer substrates with amino acid residues containing 
small side chains, such as Ala and Gly, as P2 sites. 

Structure Motifs of the Noncatalytic Region of Matriptase — 
The noncatalytic region of matriptase contains two sets of 
repeating sequences, which may serve as regulatory and/or 
binding domains for interactions with other proteins. Four 
tandem repeats of -^35 amino acids including six conserved 
cysteine residues (Fig. &A) were found at the amino-terminal 
region (amino acids 280-430) of its serine protease domain. 
They are homologous to the cysteine-containing repeat of the 
LDL receptor (23) and related proteins (24). All of these cys- 
teine residues are likely be involved in disulfide bonds. In the 
LDL receptor, the homologous seven repeating sequences serve 
as the ligand-binding domain. By analogy, the four tandem 
cysteine-containing repeats in matriptase may also be the sites 
of interaction with other macromolecules. In addition, the cys- 
teine-containing LDL receptor domain was found in other pro- 
teases such as enteropeptidase (15, 16). 

The amino-terminal region of matriptase (amino acids 42— 
268) contains another two tandem segments with internal ho- 
mology. These segments resemble partial sequences, originally 
identified in complement subcomponents Clr (25, 26) and Cls 
(27, 28). This Clr/s domain was also found in other serine 
proteases, such as enteropeptidase, an activator of trypsin ogen 
(15, 16), and in the astacin subfamily of zinc metalloprotease, 
such as bone morphogenetic protein- 1 (29) and Drosophila 
tolloid gene, a dorsal- ventral patterning protein (30). Although 
the exact roles of the Clr/s domains in these proteins remain 
unclear, a deletion of the first Clr/s domain in complement 
subcomponent Clr impairs tetramer formation of Clr with Cls 
(31). These results suggest that this domain may be involved in 
protein-protein interactions. In our previous study (7), a small 
proportion of the matriptase in breast cancer cells was identi- 
fied in its complexes. One of the complexes has been isolated 
from human milk, and the binding protein was identified as a 
firagment of a Kunitz-type serine protease inhibitor. Whether 
the LDL receptor domain and the Clr/s domain in matriptase 
are both involved in the interaction with the Kunitz-type serine 
protease inhibitor remains to be investigated. 

In conclusion, matriptase is a trypsin-like serine protease 
with several potential regulatory modules (Fig. 6). Its broad 
spectrum cleavage activity may contribute to the degradation 
of the extracellular matrix, activation of other proteases, and 
processing of growth factors. All of these ascribed functions 
could contribute to important aspects of tumor progression 
such as cancer invasion and to physiological process such as 
differentiation and lactation. The presence of potential protein- 
protein interaction domains and ligand-binding domains in 
matriptase suggests that the interaction of matriptase with 
other macromolecules on the cell surface (such as the luminal 
surface of the mammary gland) may regulate its activation, 
inhibition, and presentation. Aberrant regulation of matriptase 
processing may be involved in the malignant progression of 
cancers. 
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Enteropeptidase is a membrane-bound serine protease that initiates the 
activation of pancreatic hydrolases by cleaving and activating trypsino- 
gen. The enzyme is remarkably specific and cleaves after lysine residues 
of peptidyl substrates that resemble trypsinogen activation peptides such 
as Val-(Asp)4-Lys. To characterize the determinants of substrate speci- 
ficity, we solved the crystal structure of the bovine enteropeptidase cata- 
lytic domain to 2.3 A resolution in complex with the inhibitor Val-(Asp)4- 
Lys-chloromethane. The catalytic mechanism and contacts with lysine at 
substrate position PI are cor\served with other trypsin-like serine pro- 
teases. However, the aspartyl residues at positions P2-P4 of the inhibitor 
interact with the enzyme surface mainly through salt bridges with the 
atom of Lys99. Mutation of Lys99 to Ala, or acetylation with acetic anhy- 
dride, specifically prevented the cleavage of tripsinogen or Gly-(Asp)4- 
Lys-p-naphthylamide and reduced the rate of inhibition by Val-(Asp)4- 
Lys-<hloromethane 22 to 90-fold. For these reactions, Lys99 was calcu- 
lated to account for 1.8 to 2.5 kcal mol"^ of the free energy of transition 
state binding. Thus, a unique basic exosite on the enteropeptidase surface 
has evolved to facilitate the cleavage of its physiological substrate, trypsi- 



nogen. 
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Introduction 

Enteropeptidase was discovered one hundred 
years ago in L P. Pavlov's laboratory (Pavlov, 
1902) as the first known enzyme to activate other 
enzymes, and it remains a remarkable example of 
how serine proteases have been crafted by evol- 
ution to regulate metabolic pathways. Enteropepti- 
dase controls a primordial enzymatic cascade that 
is conserved among vertebrates and is essential 
for normal intestinal digestion. When pancreatic 
secretions enter the duodenum, enteropeptidase 
recognizes the acidic activation peptide of trypsi- 
nogen and cleaves it. The trypsin product then 
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cleaves and activates the other zymogens in pan- 
creatic fluid, enabling the digestion of food. Conge- 
nital deficiency of enteropeptidase in humans 
causes severe intestinal malabsorption with 
diarrhea, vomiting, and growth failure that can be 
treated successfully by supplementation with pan- 
creatic extract (Hadom et a/., 1969; Haworth et ah, 
1971). 

Several enteropeptidase domains are required 
for the efficient activation of trypsinogen. Entero- 
peptidase is a two-chain polypeptide that is 
derived from a single-chain precursor, and consists 
of an N-terminal ^^120 kDa heavy chain that is dis- 
ulfide-linked to a C-terminal ^^47 kDa light chain. 
A transmembrane segment in the heavy chain 
anchors enteropeptidase in the brush border of 
duodenal enterocytes. The light chain consists 
of a chymotrypsin-like serine protease domain 
(reviewed by Lu & Sadler, 1998), Replacement of 
the transmembrane domain by a cleavable signal 
peptide does not impair trypsinogen activation. 
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indicating that membrane association is not 
required for substrate recognition (Lu et a/., 1997). 
The removal of heavy chain domains by reduction 
(Liglit & Fonseca, 1984), proteolysis (Mikhailova & 
Rumsh, 1999), or mutagenesis (LaVallie el al, 1993; 
Lu et aL, 1997) reduces the rate of trypsLnogen 
activation ?^500-fold, demonstrating that the heavy 
chain is necessary for optimal cleavage of trypsino- 
gen. The enteropeptidase light chain, however, is 
sufficient for the normal recognition of small 
peptidyl substrates that resemble the trypsinogen 
activation peptide Val-(Asp)4-Lys (LaVallie et ah, 
1993; Lu et al., 1997). 

The structural determinants of substrate speci- 
ficity have not been identified on the enteropepti- 
dase light chain, but their locations have been 
proposed based upon comparisons v/ith other 
serine proteases. The enterof)eptidase serine pro- 
tease domain contains a basic tetrapeptide segment 
consisting of Arg96-Arg-Arg-Lys99 for porcine 
(Matsushima et al, 1994), mouse (Yuan et aL, 1998), 
and human (Kitamoto et al, 1994) enteropeptidase; 
or Lys96-Arg-Arg-Lys99 for bovine (Kitamoto et aL, 
1994; LaVallie et aL, 1993) and rat enteropeptidase 
(Yahagi et aL, 1996). This segment is not conserved 
in other serine proteases, and computer modeling 
suggests that it is located on the protein surface 
where it might bind the acidic P2-P5 residues of 
trypsinogen activation peptides (Kitamoto et aL, 
1994; Matsushima et aL, 1994) (see the legend to 
Figure 2 for the residue numbering). Thus, entero- 
peptidase appears to have an extended binding 
site or "exosite", distinct from the catalytic center, 
which recognizes substrate amino acid residues on 
the N-terminal side of the cleaved bond. At present 
there is no evidence that enteropeptidase has 
specificity for amino acid residues C-termirml to 
the scissiie bond. 

Similar exosites in other highly regulated serine 
proteases are well documented to control the rec- 
ognition of substrates, cof actors and inhibitors. For 
example, the blood clotting protease thrombin has 
two so-called "anion-binding exosites" (Bode et aL, 
1992). Exosite 1 interacts with acidic regions of pre- 
ferred substrates such as fibrinogen and cof actors 
such as thrombomodulin. In contrast to the known 
properties of enteropeptidase, however, thrombin 
exosite 1 interacts with amino acid residues on the 
C-terminal side of the cleaved bond. Thrombin 
exosite 2 is on the opposite side of the molecule 
and interacts with heparin, thereby promoting the 
inhibition of thrombin by antithrombin (Sheehan & 
Sadler, 1994). These exosites have been modified 
by mutagenesis to create thrombin variants with 
novel properties (Sheehan & Sadler, 1994; Wu et aL, 
1991). The characterization of enteropeptidase exo- 
sites, by analogous approaches, would advance 
our understanding of the regulation of digestion 
and facilitate the design of enteropeptidase deriva- 
tives with new substrate specificity. 

We now have determined the crystal structure of 
the bovine enteropeptidase light chain complexed 
with an inhibitor, VaI-(Asp)4-Lys-chloromethane 



(VD4K-cm), that mimics the trypsinogen activation 
peptide. The catalytic mechanism and the subsite 
that recognizes the PI lysine residue are conserved 
with other chymotrypsin-like serine proteases, but 
the aspartyl side-chains at positions P2-P4 of 
the inhibitor are accorrunodated mainly by ionic 
interactions with a unique exosite on the enzyme 
surface. By mutagenesis and chemical modifi- 
cation, we demonstrate that a single lysyl side- 
chain within this exosite is required for the clea- 
vage of trypsinogen and similar peptidyl sub- 
strates. These distinctive features of 
enteropeptidase illustrate the specificity that serine 
proteases can acquire by combiiung modifications 
of the protease domain with additional motifs on 
accessory domains. 

Results 

structure determinatfon 

The crystal structure of the serine protease 
domain of bovine enteropeptidase (L-BEK) boimd 
to the inhibitor VD4K-cm was solved by molecular 
replacement using the structure of Y-chymotrypsin 
(PDB entry code IGCD) (Harel et al, 1991) as the 
search model, to which enteropeptidase shows 
35.9% sequence identity (Figure 1). The structure 
was refined to final R factors of R = 23.4 % and 
Rf^ = 26.9% (Figure 2 and Table 1). For ease of 
comparison to related serine protease structures, 
we use the chymotrypsin-derived residue number- 
ing scheme proposed by Bode et aL (1992). The 
protein used for the present structure determi- 
nation (L-BEK) contains only 13 C-terminal amino 
acid residues of the enteropeptidase heavy chain. 
Note that the usage of the terms "heavy" and 
"light" chain is the reverse of what is common 
usage for chymotrypsin and thrombin. The present 
structure shows cm uninterrupted backbone for the 
two-chain molecule, comprising residues 1 through 
7 (chymotrypsin numbering) of the N-terminal 
domain and residues 16 through 243 of the serine 
protease domain. Residues 8 through 13 of the 
N-terminal domain and residues 244 and 245 of 
the serine protease domain protrude freely into the 
solvent and could not be modeled. 

Tertiary structure 

As expected, based upon its homology to other 
serine proteases, L-BEK is very similar in fold to 
both representative family members chymotrypsin 
and thrombin (Figure 3(a) and (c)): the tertiary 
structure consists of two six-stremded ^barrels, 
either of which makes up about one half of the 
entire molecule. The structure of L-BEK superim- 
poses on chymotrypsin with a root-mean-square 
deviation of 1.10 A for 224 positions, and it 
superimposes on thrombin with a root-mean- 
square deviation of 1.23 A for 234 C" positiorxs. 
Variations in secondary structure occur mainly in 
the loop regions. L-BEK also contains, relative to 
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Figure 1. Sequence alignment of enteropeptidase (L-BEK), chymotrypsin (Chymo) and thrombin (Thromb) protease 
domains. Amino acid sequences are aligned based on topological equivalence of the superimposed crystal structures. 
Amino acid residues are numbered based on the sequence of chymotrypsinogen. Residues of L-BEK and the other 
proteases are boxed if the separation between C" positions is ^1.6 A. Active-site residues (His57, Aspl02, Serl95) are 
in filled black boxes. Residues in contact v^^ith the VD4K-cm inhibitor are shaded in blue. L-BEK secondary structure 
elements are indicated below the sequences; helices (a-helix^ 3iQ-helix) are shown as filled boxes and (J-strands are 
shown as open boxes. Secondary structure conserved with y-diymotrypsin are numbered sequentially, and those 
designated by prime numbers (i.e. 3,ol'/ Pl') are not present in y-chymotrypsin. The arrow indicates the activation 
cleavage site that separates the heavy chain remnant (residues 1-15) from the light chain (residues 16-243). 



chymotrypsin, an additional P-strand, pi', and an 
additional small 3,,,-helix, 3iol' (Figures 1 and 3(a)). 
The 3io-helix is part of the so-calleci "60-loop" that 
connects helix al and strand p4, and a similar S^o- 
helix is present in the much longer 60-loop of 
thrombin. 

The enteropeptidase serine protease domain is 
stabilized by five disulfide bonds, all of which are 



conserved with chymotrypsin: Cysl-Cysl22, 
Cys42-Cys58, Cysl36-Cys201, Cysl68-Cysl82, and 
Cysl91-Cys220 (Figure 3(a)). Thrombin lacks one 
of these disulfide bonds, corresponding to that 
between Cysl36 and Cys201 of enteropeptidase. 
The 13 residue N-terminal chain of L-BEK is co- 
valently linked to the serine protease domain by 
the disulfide bond between Cysl and Cysl22. 
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Table 1. Data collection and refinement statistics 



Figure 2. Representative regions of electron density. 
Simulated annealing omit maps, using Fourier coeffi- 
cients F„ - and model phases, were calculated by 
deleting the VD4K-chloromethane inhibitor either (a) 
alone or (b)-(c) including an additional region of 3.5 A 
around it-(a) View of the inhibitor peptide from the pro- 
tein outwards. Electron density for the hexapeptide is 
observed for positions PI P4. (Amino acid residues of 
peptidyl substrates or inhibitors customarily are num- 
bered Fl, P2, P3, etc., from the scissile bond toward the 
N terminus, and PI', P2', on the C-terminal side of the 
scissile bond. The corresponding subsites on the cognate 
protease are numbered Si, S2, S3 and SI', S2' (Schechter 
& Berger, 1967)).(b) Interaction of the aspartyl side- 
chains of residues P2-P4 with Lys99 and Tyrl74 of 
L-BEK. (c) Covalent lirUcage of the C terminus of the 
inhibitor to the catalytic residues His57 (N^^-methylene 
carbon) and Serl95 (O^ carbonyl carbon atom), mimick- 
ing the tetrahedral intermediate of the hydrolysis reac- 
tion. The figure was produced with the program O 
Oones & Thirup, 1986; Jones et oL, 1991). 



Aside from this single disulfide bond, the inter- 
actions of this short polypeptide with the bulk of 
the structure are relatively weak, consisting of an 
amino-aromatic interaction between Lys4 and 
Trp27, and hydrogen bonds between main-chain 
atoms of Gly2 and either Trp207 or Prol20. Conse- 
quently, the remaining residues 8-13 of the heavy 
chain are disordered. 

The catalytic center 

The catalytic center contains the signature struc- 
tural elements of serine proteases: the catalytic 
triad consisting of Aspl02, His57 and Serl95; the 
oxyanion hole formed by the main-chain amide 
nitrogen atoms of residues 193 and 195; and the SI 
subsite or specificity pocket that interacts with the 
side-chain of the PI substrate/ inhibitor residue 
(Figure 4(a) and (d)). The VD4K-cm ii\hibitor is 



A. Data collection 
Data set 

Radiation, detector system 
Resolution (A) 
Total/unique reflections 
Completeness (%)" 

R.m. (%)** 

B. Refinement 
Resolution (A) 

Reflections (completeness)' (%) 
Non-H atoms 

T.m.s, deviations' 
Bond lengths (A) 
Bond angles (deg.) 

B values (main-chain/side-chain) (A^) 



Native 
CuKa, Raxis 
30-2.3 

28,051/10,541 
92.6 (89.2) 
4.4 (8.8) 

30.0-23 

9854 (87.6/82.0) 
2023 

23.4/26.9 

0.006 

1-39 

1.5/ZO 



• Completeness for I/a{f}> 10; value for high resolution 
shell (2,38-2.3 A) in parentheses. 



{/)1/E /, where / = observed intensity, and 
(J) = average intensity from multiple observations of symmetry- 
related reflections; the value for the high-resolution shell is in 
parentheses. 

Numbers reflect the "working set" of reflections at F/ 
a(F) > 2.0; values for completeness for the overall /high-resolu- 
tion shell (2.4-2.3 A) are in parentheses. 

** Rfj„ was calculated on the basis of 546 reflections (5.5% of 
the observed reflections) that were randomly omitted from the 
refinement. 

• Root-mean-square (r.m.s.) deviation from ideal bond 
lengths and angles (Engh & Huber, 1991) and r.m.s. deviation 
in 6-factors of bonded atoms. 



identical in sequence to the trypsinogen activation 
peptide and is covalently bound to the catalytic 
residues His57 and Serl95 through its C-terminal 
residue Lys-Pl (Figures 2(c) and 4(a)). The carbonyl 
carbon atom of Lys-Pl forms a tetrahedral hemike- 
tal with Serl95 O^, and the methylene carbon atom 
of the inhibitor is bound to the imidazole ring 
(N*^) of His57. This arrangement mimics the tetra- 
hedral intermediate of the substrate hydrolysis 
reaction. The side-chain of Lys-Pl inserts deeply 
into the SI pocket, at the bottom of which Aspl89 
neutralizes the terminal amino group (Figure 4(b)). 
The interactions of Lys-Pl at the bottom of the 
specificity pocket also include short hydrogen 
bonds to both the hydroxyl group and the carbonyl 
oxygen atom of Serl90. Lys-Pl also makes short 
hydrogen bonds to two water molecules, WAT438 
and WAT407, that correspond to water molecules 
429 and 494, respectively, of the thrombin-hirugen 
complex (Vijayalakshmi et aL 1994). These two 
water molecules are conserved among several ser- 
ine protease structures (Krem & Di Cera, 1998). 
The aliphatic part of the Lys-Pl side-chain packs 
against the main-chain atoms of Phe215 and 
Ser214, as well as the C^^ atom of Thr213 
(Figure 4(b) and (d)). 

The extended substrate binding exoslte 

Despite its covalent attachment to the protein 
through the catalytic center, the VD4K-cm ir\hibitor 
is disordered at its N-terminal end and electron 




Figure 3. Overall fold of enteropeptidase compared to 
y-chymotrypsin and a-thrombin. (a) Stereo ribbon 
diagram of L-BEK. The catalytic residues are labeled 
and the disulfide bonds are shown in yellow. Superposi- 
tion of L-BEK (grey) with (b) y-chymotrypsin (IGCD, in 
cyan) and (c) with human a-thrombin (IPPB, in green). 
The structures were aligned with respect to the C° pos- 
itions of the catalytic residues His57, Aspl02 and 
Serl95, and are shown in the same orientation as for 
L-BEK in (a). This Figure was produced with the pro- 
gram RIBBONS (Carson, 1997), as were Figures 4(a)(c), 
5, and 7. 



density was observed only for residues Lys-Pl 
through Asp-P4 (Figure 2(a)). The inhibitor geome- 
try is remarkably similar to that of D-Phe-Pro-Arg- 
chloromethane (PPACK) in thrombin, as illustrated 
in Figure 5. The aligriment of L-BEK with throm- 
bin, based only on the C** atoms of the catalytic 
triad, leads to a near perfect superposition of the 
two inhibitor molecules, including the C** positions, 
despite their complete lack of sequence similarity. 
Although VD4K-cm forms two main-chain to 
main-chain hydrogen bonds with residues in 
strand pil (Figures 1 and 4(d)), it does not other- 
wise adopt a p-strand configuration in contrast to 
what is observed for the thrombin-PPACK struc- 
ture (Bode et al, 1992). 

Aside from the SI subsite, the major determinant 
of VD4K-cm recognition is Lys99. The basic side- 
chain of this residue coordinates the aspartic acid 
side-chains at positions P2 through P4 of the 
inhibitor. These three carboxylate groups surround 
the terminal amino-group of Lys99 in a fashion 
similar to an inverted tripod. Lys99 forms salt 
bridges only with Asp-P2 and Asp-P4, whereas 
Asp>-P3 is hydrogen bonded to the hydroxyl 




365 



moiety of Tyrl74 (Figure 4(c) and (d)). Residue 
Phe215 is also indirectly involved in substrate 
binding, with its phenyl ring serving as a hydro- 
phobic platform that supports the side-chain of 
Lys99 (Figures 2(b) and 4(c)). 

Lys99 is part of a sequence of four basic amino 
acid residues in the pspfe loop that, based on mol- 
ecular modeling, had been predicted to define the 
substrate specificity of enteropeptidase (Kitamoto 
et aU, 1994; Matsushima et aL, 1994). In the present 
crystal structure the side-chain of Arg97 is comple- 
tely disordered, that of Arg98 is poorly defined, 
and both extend into solvent, Lys96 does not make 
any close contacts with the inhibitor, but folds 
back onto the protein surface to form a short 
hydrogen bond (2.8 A) with the hydroxy! group of 
Tyr94. Tyr60 also is in close proximity to the term- 
inal amino group of Lys96. As discussed below, 
the contribution of these basic residues to substrate 
recognition was examined further by mutagenesis. 

The electrostatic surface of L-BEK (Figure 6) 
includes two prominent positive charges in the 
vicinity of the inhibitor binding site: Lys99 is on 
the N-terminal side and Arg60f is on the C-term- 
inal side of the scissile bond position. Arg60f is 
held in place by hydrophobic interactions with the 
aromatic ring of Phe35 and a short hydrogen bond 
donated by the carbonyl oxygen atom of Cys58 
(Figure 7). The latter interaction positions the 
guanidinium group of Arg60f at a distance of 8 A 
from the catalytic center, where it would not be 
expected to have a direct effect on the recognition 
of VD4K-cm. In the superposition with thrombin 
(Figure 7), the C° atom of Arg60f is closest to the 
atom of Phe60h, but its guanidinium group lies 
close to the head group of Lys60f; the latter forms 
a hydrogen bond with the carbonyl oxygen atom 
of His57. The basic nature of these side-chains and 
their similar position relative to the catalytic center 
suggest that Arg60f of enteropeptidase and Lys60f 
of thrombin may have a similar function in recog- 
nition of residues C- terminal to the scissile bond. 
For thrombin, the effects of mutagenesis are con- 
sistent with this hypothesis because alteration of 
Lys60f markedly impairs the cleavage of fibrinogen 
without affecting the cleavage of D-Phe-pipecolyl- 
Arg-p-nitroanilide (Wu et al, 1991). 

Mutagenesis and chemical modification 
of L-BEK 

To determine the contribution of specific basic 
amino acid residues to substrate recognition, 
mutant forms of L-BEK were prepared in which 
each of the Arg or Lys residues at positions 60f 
and 96-99 was changed to Ala. The proteins were 
expressed in a baculovirus system and purified 
by affinity chromatography on STI-agarose. In 
addition, a sample of purified L-BEK was treated 
with acetic anhydride. The conditions of acety- 
lation were shown previously to result in the 
efficient modification of lysyl residues on porcine 
enteropeptidase (Baratti & Maroux, 1976). By 
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Figure 4. Close-up view of the 
inhibitor binding site, (a) The 
C-terminal Lys (K-pl) of the inhibi- 
tor is covalently bound (thick lines 
in magenta) to His57 and Serl95 of 
L-BEK. The carbonyl oxygen atom 
of Lys-Pl (K-pl) forms hydrogen 
bonds (thin cyan lines) witti water 
WAT436 and the main-chain nitro- 
gen atoms of Serl95 and Glyl93, 
the latter being part of the "oxya- 
nion hole", (b) SI recognition 
pocket showing protein residues in 
contact with K-pl. (c) Stereo view 
of the P2-P4 binding sites. The 
side<hain of Arg97 is disordered 
and modeled as Ala. Inhibitor 
residues are labeled in magenta 
throughout, and protein residues 
are labeled in black. Atom color 
coding: carbon, grey; oxygen, red; 
nitrogen, blue; sulfur, green; and 
water molecules, yellow spheres, 
(d) Schematic diagram of protein- 
inhibitor interactions. Broken lines 
indicate contacts for which the 
distances are given in Angstroms. 



SDS-poIyacrylamide gel electrophoresis, all pro- 
teins appear to be homogeneous. Under non-dena- 
turing conditions, acetylated L-BEK exhibits 
markedly increased electrophoretic mobility con- 
sistent with the neutralization of amino groups 
(Figure 8). 

Each of these proteins cleaved the small ester Z- 
Lys-SBzl with nearly normal kinetics^ demonstrat- 
ing that the catalytic center was intact (Figure 9 
and Table 2). Cleavage of the larger substrates Gly- 
(Asp)4-Lys-p-naphthylamide (GD4K-na) and trypsi- 
nogen was decreased minimally by the substi- 
tutions Arg97AIa and Arg;98Ala. The mutations 



Arg60fAla and Lys96Ala decreased the catalytic 
efficiency of GD4K-na cleavage by up to approxi- 
mately fivefold (Table 2) and similarly decreased 
the relative rate of trypsinogen activation 
(Figure 9), indicating a modest change (-hO.8 to 
+1.0 kcal mol"^) in the free energy of transition 
state binding, AG^ (Wilkinson et al, 1983). How- 
ever, activity toward both of these substrates was 
essentially abolished by the mutation Lys99Ala. 
Accurate kinetic constants could not be determined 
for this mutation (Table 2); the low relative activity 
(Figure 9) toward both GDaK-na {^3%) and trypsi- 
nogen (^^1.5%) suggests that removal of this lysyl 



Structure of Enteropeptidase 




367 



H57 




Figure 5. Structural superposition of the VD4K-cm 
inhibitor of enteropeptidase with D-Phe-Pro-Arg-chloro- 
methane (PPACK) of thrombin (IPPB). The alignment 
resulted from the superposition of the C" positions of 
the catalytic residues His57, Aspl02, and Serl95 in both 
proteins. Enteropeptidase residues and inhibitor atoms 
are shown in color^coded sticks: grey for red for O, 
blue for N. Residues and inhibitor atoms of thrombin 
are shown in green sticks. The view is from the protein 
outwards. 



side-chain increases AGy by 2.1 to 2.5 kcal mol~^. 
Acetylation of L-BEK also markedly decreased the 
rate of cleavage of both GD4K-na {^13%) and 
trypsinogen (^^1.5%), but enhanced the cleavage of 
Z-Lys-SBzl (Figure 9 and Table 2). 

Rate constants for inhibition by VD4K-cin also 
were determined to assess the effect of mutations 
on the recognition of the trypsinogen activation 
peptide (Table 3). The magnitude and direction of 
the changes are similar to those observed for clea- 
vage of GD4K-na and trypsinogen. The substi- 
tutior\s Arg60fAla, Lys96Ala, Arg97Ala, and 
Arg98Ala had modest effects on the inhibition 
reaction, increasing AG-r by 0.3 to 0.8 kcal mol'^ 
In contrast, the mutation Lys99Ala markedly 
reduced the rate of inhibition, increasing AGj by 
1.8 kcal mol~*. Acetylation of L-BEK also markedly 
slowed the rate of inhibition by VD4K-cm, increas- 
ing AGj by 2.7 kcal mol"'. These values of AAGy 
for inhibition by VD4K-cm are consistent with 



those estimated from the relative rates of substrate 
cleavage (Figure 9). 

Discussion 

structural interpretation of substrate specificity 

Limited qualitative studies employing protein 
substrates (Anderson et a/., 1977; Light et al, 1980) 
and synthetic peptides (Maroux et al., 1971) indi- 
cate that mammalian enteropeptidase is remark- 
ably specific. With few exceptions, the PI residue 
must be basic (e.g. Lys, Arg, or homoarginine) and 
the P2 and P3 positions must be acidic (e.g. Asp, 
Glu or carboxymethylcysteine). The substituents at 
P4 and P5 are less critical, but additional acidic 
residues in these positions increase affinity for the 
enzyme (Maroux et al., 1971). 

• TTie crystal structure of L-BEK provides a 
reasonable explanation for these properties. The 
catalytic center of enteropeptidase is conserved 
with related enzymes that prefer a basic side-chain 
in the PI position such as trypsin, and Lys-Pl of 
the inhibitor VD4K-cm makes numerous close con- 
tacts with L-BEK (Figure 4(d)). Acidic residues on 
the N-terminal side of residue PI interact with an 
extended exosite on the enzyme surface, and the 
number of contacts decreases as the distance from 
the catalytic center increases. For example, Asp-P2 
main-chain atoms make four close contacts with 
L-BEK, and its carboxylate side-chain makes two 
H-bonds with the atom of Lys99; Asp-P3 makes 
half as many contacts, Asp-P4 makes only one 
H-bond between its carboxylate group and the 
atom of Lys99, and residues Asp-P5 and Val-P6 
are disordered. Thus, the interface between L-BEK 
and VD4K-cm is consistent with the increased 
tolerance for variarions in substrate structure at 
positions distal to P3. 

The distribution of interactions between VD4K- 
cm and bovine enteropeptidase is mirrored by the 
observed variation among trypsinogen activation 
peptides. Sequences are known for at least 30 




Figure 6. Electrostatic surface diagram of the Val-(Asp)4-Lys-chloromethane inhibitor binding site of enteropepti- 
dase. Negative and positive surface charges are shown in deep red and blue, respectively, with linear interpolation in 
between. Conserved water molecules WAT407 and WAT438 are shown as spheres in cyan, inhibitor atoms are 
shown as sticks and are color-coded as described in the legend to Figure 4. (a) Overall view, (b) Close up view of the 
SI binding pocket. The Figure was produced with the program GRASP (Nicholls c^ a/., 1991). 
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Figure 7. Structural role of residue Arg60f in comparison to Lys60f of thrombin. Enteropeptidase secondary struc- 
ture elements are shown in grey and atoms are color<oded as described in the legend to Figure 4. Secondary 
structure elements and carbon atonrw of thrombin are shown in green, keeping all other atom color assignments unal- 
tered. The structures were aligned as shown in Figure 3. Interestingly, Arg60f aligns with Phe60h of thrombin with 
regard of the C° position, while its guanidinium group is very close to the terminal amino group of Lys60f of 
thrombin. 




genetically distinct trypsinogens, representing 
mamnnals, birds, amphibians and fish (Bricteux- 

200 Gregoire et ah, 1972; Lu & Sadler, 1998). Position 
PI is occupied almost exclusively by Lys. Very few 

^' trypsinogens have Glu instead of Asp at position 

68 P2 or P3. Most residues at position P4 are Asp, but 

Glu or Asn occur in 55^30% of cases. Position P5 

^ shows more variation; Asp is present in ?^60%, but 

aromatic, aliphatic, small polar and basic side- 

29 chains also are found. Position P6 is not cor\served. 

Therefore, the tendency of trypsinogen activation 
peptide residues to vary during vertebrate evol- 
ution correlates inversely with the number and 
location of close contacts in the L-BEK-VD4K struc- 
ture. 



Figure 8. Gel electrophoresis of enteropeptidase var- 
iants, (a) Samples (5 pg) of affinity purified enteropepti- 
dase variants were analyzed by SDS-polyacrylamide gel 
electrophoresis without reducing agent and visualized 
by staining with Coomassie brilliant blue (Lacmmli, 
1970). The positions of molecular mass markers are indi- 
cated at the right in kilodaltons. (b) Enteropeptidase var- 
iants were analyzed by native gel electrophoresis using 
a similar polyacrylamide gel and buffer system except 
that SDS was omitted from the sample buffer. 



Energetic contributions of specific residues to 
substrate recognition 

The contacts between L-BEK and VD4K-cm are 
dominated by ionic interactions between aspartyl 
side-chains and Lys99, and the importance of these 
interactions is supported by the effect of acety- 
lation on enteropeptidase specificity. Reaction of 
porcine enteropeptidase with acetic anhydride 
reduces its activity toward trypsinogen by more 
then 98%, but increases its activity toward L-N-a- 
benzoylargirune p-nitroarulide (L-BAPNA) by 1.8- 
fold (Baratti & Maroux, 1976). These studies were 
performed with full-length enteropeptidase and 
therefore could not localize the critical modified 
residues to either the light chain or the heavy 
chain. However, we found that acetylated L-BEK 
has a similar phenotype: it cleaves the simple 
thioester substrate Z-Lys-SBzl more rapidly than 
does native L-BEK (Table 2), but cannot cleave 
either GD4K-na or trypsinogen (Figure 9), TTius, 
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Figure 9. Relative rates of substrate cleavage by enter- 
opeptidase variants. The activity of the indicated prep- 
arations of enteropeptidase light chain Wcis assayed with 
the substrates Z-Lys-SBzl (open boxes), GD4K-na (filled 
boxes), and trypsinogen (hatched boxes). The values 
obtained are expressed as the mean percentage ± SE for 
at least three independent determinations, normalized to 
the activity observed for wild-type L-BEK (100%). 



residues in the enteropeptidase light chain that are 
sensitive to acetylation, such as Lys or Tyr, are 
necessary for the recognition of peptidyl substrates. 
The best candidate target to explain the effect of 
acetylation is Lys99, which makes at least three H- 
bonds with Asp-P2 and Asp-P4 in the L-BEK- 
VD4K complex (Figure 4(d)). The other possibility, 
Tyrl74, makes ordy a single H-bond with Asp-P3. 

Mutagenesis and kinetic studies support a major 
contribution of Lys99 to the energetics of substrate 
binding. Substitution of Lys99 by alanine caused 
similar impairments in the ability of enteropepti- 
dase to cleave either GD4K-na or trypsinogen 
(Figure 9 and Table 2), and in the rate of entero- 
peptidase inhibihon by VD4K-cm (Table 3), For the 
latter reaction, the Lys99Ala mutation increased 
AGt by 1.8 kcal mol"^ and acetylation of L-BEK 
increased AGj by 2.7 kcal mol"^ Mutations at 
other positively charged residues have much smal- 
ler effects on the kinetics of substrate cleavage or 
inhibition by VD4K-cm. The similar phenotypes of 
acetylated L-BEK and the Lys99Ala mutant are 
consistent with the importance of ionic interactions 
in the recognition of substrate residues in the 
P2-P4 positions, and suggest that the effects of 



acetylation are due mainly to the loss of positive 
charge at Lys99. 

A hierarchy of functional sites participates in 
substrate recognition 

The extended contacts between L-BEK and 
VD4K-cm appear to explain the preference of enter- 
opeptidase for similar peptidyl substrates, but do 
not fully account for the efficient activation of tryp- 
sinogen. Two-chain enteropeptidase cleaves trypsi- 
nogen i^500-fold more rapidly than does the 
isolated light chain (Lu et al., 1997), indicating that 
the heavy chain promotes physiological substrate 
recognition. Thus, a hierarchy of functional sites 
has evolved to optimize trypsinogen activation. 
The catalytic center confers specificity for cleavage 
after basic amino acid residues. An exosite on the 
light chain, distinct from the catalytic center, recog- 
nizes acidic trypsinogen activation peptides, and at 
least one site on the heavy chain interacts with and 
further accelerates the cleavage of trypsinogen. 
This feature of the enteropeptidase-trypsinogen 
interaction is shared by many other serine pro- 
teases that participate in highly regulated meta- 
bolic pathways, and it illustrates general principles 
underlying the adaptation of serine protecises to 
cleave a restricted range of substrates. Such adap- 
tation often has been accomplished by exploiting 
structural features of both catalytic and non-cataly- 
tic domains to interact with complementary 
surfaces on cofactors or substrates. 

Materials and Methods 

Reagents and proteins 

Bovine trypsinogen and bovine trypsin were from 
Worthington (Freehold, NJ). Thiobenzyl benzyloxy- 
carbonyl-L-lysinate (Z-Lys-SBzl), and the enteropeptidase 
substrate Gly-Asp-Asp-Asp-Asp-Lys-P-naphthylamide 
(GD4K-na) were from Bachem (King of Prussia, PA). 
Chromogenic substrates S-2366 (pyroGlu-Pro-Arg-p- 
nitroanilide) and S-2765 (Z-D-Arg-Gly-Arg-p-nitroani- 
lide) were from Chromogenix (Sweden). Ovomucoid, 
soybean trypsin inhibitor agarose (STI-agarose), acetic 
anhydride, p-nitrophenyl p'-guanidinobenzoate, and 5,5'- 
dithiobis(2-mtrobenzoic acid) (DTNB) were from Sigma 
(St. Louis, MO). 



Table 2. Kinetic parameters for the cleavage of substrates Z-Lys-SBzI 


and GD4K-na 










Z-Lys-SBzl 






GD4K-na 




Enzyme 


K„ (nM) 




fcc3./K„, (MM-' s-^) 


(mM) 




(mM-' s-') 


L-BEK 

Acetyl L-BEK 

R60fA 

K96A 

R97A 

R98A 

K99A 


120 ± 10 
40 ±10 
120 ± 10 
100 ±30 
120 ± 40 
140 ± 10 
50 ±10 


129 ± 4 
111 ±4 
159 ± 19 
108 ±22 
128 ±33 
128 ±3 
120 ± 1 


1.05 
2.93 
1.36 
1.10 
1.02 
0.88 
2.53 


0.61 ± 0.09 

NA 
0.73 ± 0.08 
1.25 ± 0.07 
0.66 ± 0.07 
0.77 ± 0.02 

NA 


42.7 ± 4.0 

NA 
12.7 ± 1.0 
17.1 ± 1.5 
25.5 ± 2.3 
39.1 ± 0.8 

NA 


70.4 
NA 
17.3 
13.7 
38.6 
51,0 
NA 



Values for and Jt„, are expressed as the mean ± SE of three independent determinations. NA, activity insufficient to determine 
kinetic constants. 
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Table 3. Kinetic parameters for the inhibition of enteropeptidase 






Enzyme 








AACt (kcal mol-*) 


L-BEK 


0.013 ± 0.003 


1.0 ±0.3 


l0.4 X 


n 


Acetyl L-BEK 


0.0010 ± 0.0001 


7.3 ±1.2 


0.15 ± 0-02 


+2.7 


R60fA 


0.061 ± 0.015 


17±5 


3.59 ± 0.08 


+0.8 


K96A 


0.0048 ± 0.0008 


0.9 ± 0.3 


5.9 ±1.3 


+0.5 


R97A 


0.0073 ± 0.0015 


1.0 ±0.3 


7.5 ± 0.4 


+0.3 


R98A 


0.0072 ± 0.0002 


0.84 ±0.04 


8.7 ± 0.2 


+0.3 


K99A 


0.00024 ± 0.00001 


0.4 ± 0.2 


0.6 ±0.1 


+1.8 



Values for K, and k2 are expressed as the mean ± SE of at least three independent determinations. 



Plasmid constructs 

Plasmid pBlue-newL was prepared from pBEK by a 
PCR mutagenesis strategy as described (Lu et al., 1997; 
Nelson & Long, 1989) and encodes the human prothrom- 
bin signal peptide (Metl-Phe28) fused to the carboxyl- 
terminal 251 amino acid residues of bovine enterof>epti- 
dase (Tyr785-Hisl035) (Kitamoto et al, 1994). Using a 
similar mutagenesis method, plasmid pBlue-newL was 
altered to contain mutations encoding each of the amino 
acid substitutions Arg60fAla, Lys96AIa, Arg97Ala, 
Arg98Ala, and Lys99Ala. The segment encoding the 
chimeric prothrombin-enteropeptidase construct was 
excised from each plasmid by digestion with HmdlH, 
made blunt with DNA polymerase, and ligated into the 
Sma\ site of the expression vector pVL1392 (Pharmingen, 
Carlen, CA) to yield plasmids pVLnewL, pVLR60fA, 
pVLK96A, pVLR97A, pVLR98A, and pVLK99A. 

A fragment of plasmid pBEK encoding amino acid 
residues Cys788-Hisl035 of bovine enteropeptidase 
(Kitamoto et al, 1994) was amplified by PCR and 
inserted into the Ncol site of expression vector pET-lld 
(Novagen, Madison, WI) to yield plasmid pETL. The 
construct encodes two amino acid residues derived from 
the vector (Met-Ala) before commencing with enteropep- 
tidase sequence at Cys788. For all plasmids, the seg- 
ments derived by PCR were sequenced to confirm the 
accuracy of the construction. 



Production of enteropeptidase light chain in 
Escherichia colt (L-BEK) 

B. coli BL21 (DE3) cells (Stratagene) containing pETL 
were grown in two liters of LB/ampicillin medium, and 
recombinant L-BEK was solubilized from the inclusion 
bodies at room temperature with 10 ml of 0.1 M Tris- 
HCl (pH 8.6), 1 mM EDTA-Na, 150 mM dithioerythritol, 
and 6 M guanidine HCl. L-BEK was refolded by a modi- 
fication of a protocol described for the refolding of tissue 
plasminogen activator from lysates of E. coli (Kohnert 
€t c/., 1992). After centrifugation for 30 minutes at 50,000 

the solubilized protein was dialyzed at room tempera- 
ture against 3 M guanidine-HCl (pH 2.5), and mixed 
with 10 ml of oxidation buffer (50 mM Tris-HCi (pH 9.3), 
6 M guanidine-HCl, 0.1 M oxidized glutathione). After 
dialysis agair\st 3 M guanidine-HCl (pH 8.0), disulfide 
exchange and refolding were initiated by dropwise 
dilution with stirring into 500 ml of 0.7 M arginine-HCl 
(pH 8.6), 2 mM reduced glutathione, and 1 mM EDTA. 
After 72 hours, the reaction was dialyzed against 20 mM 
Tris-HCl (pH 7.6), 20 mM NaCl, and then digested with 
trypsin (1:50 molar ratio) for one hour. The trypsin was 
inactivated with a fourfold excess of ovomucoid and 
active L-BEK was purified to homogeneity by affinity 



chromatography on STI-agarose. The yield was 10 mg 
per two liter culture. 

The N-temninal amino acid sequence of L-BEK was 
determined after SDS-PAGE and electroblotting onto a 
polyvinylidene difluoride membrane (Kalafatis & Maim, 
1993). Ttie product had the expected two-chain structure 
and the predicted first Met residue was removed com- 
pletely during biosynthesis. The mass of L-BEK was 
27,741 Da by electrospray ionization mass spectrometry, 
and this value is consistent with the calculated mass of 
27,739,6 Da. The concentration of L-BEK determined by 
active-site titration with p-nitrophenyl p'-guanidino- 
benzoate (Chase & Shaw, 1970) agreed with the value 
determined spectrophotometrically at 280 nm using the 
calculated extinction coefficient (Pace et a/., 1995) of 
70,870 M~^cm-\ 



Production of wild-type and mutant enteropeptidase 
in bacutovirus 

Constructs pVLnewL, pVLR60fA, pVLK96A, 
pVLR97A, pVLR98A, and pVLK99A were cotransfected 
with BaculoGold DNA (Pharmingen) into Sf9 cells and 
high-titer recombinant baculovirus was prepared by 
repeated infectior\. High Five cells (1 x 10* per ml, Invi- 
trogen) were grown in Express Five serum free medium 
supplemented with 20 mM glutamine. Suspension cul- 
tures (200 ml each) were ii\fected with 0.5 ml virus 
stock. After 72 hours, conditioned medium was collected 
and adjusted to pH 8.0 by addition of ^^^20 ml/1 1 M 
Tris-HCl (pH 8), and precipitated glutamine was 
removed by centrifugation. Recombinant enteropepti- 
dase was purified by affinity chromatography on STI- 
agarose. Tlie yield was up to ~15 mg of apparently 
homogeneous enteropeptidase light chain per liter of 
medium. 



Affinity purification of enteropeptidase light chain 
variants on STI-agarose 

High Five cell conditioned medium (1000 ml) was 
applied at 50 ml /hour to a column (2 ml) of STI-agarose 
equilibrated with 20 mM Tris-HCl (pH 7.5), 50 mM 
NaCl, at 4 "^C. The column was washed with 10 ml of 
20 mM Tris-HCl (pH 7.5), 1 M NaCl, followed by 50 ml 
of 20 mM Tris-HCl (pH 7.5). Enteropeptidase was eluted 
with 50 mM glycine-HCl (pH 3.0); 1 ml fractions were 
collected and neutralized immediately with 50 nl of 2 M 
Tris-HCl (pH 8.0). Refolded and trypsin-acrivated L-BEK 
prepared in £. coli was purified similarly, applying the 
product obtained from a two liter culture to the column. 
Fractions were analyzed by SDS-PAGE (Laemmli, 1970) 
and silver staining (Morrissey, 1981), pooled, dialyzed 
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against 20 mM Tris-HCI (pH 7.5). 50 mM NaCI, and 
stored at -70 ^C. 



Preparation of a stoichiometric complex of L-BEK 
and VDDDDK-chloromethane 

The active site directed inhibitor Val-(Asp)4-Lys-chlor- 
omethane (VD^K-cm) was synthesized (Haematologic 
Technologies, Inc.) and its structure was confirmed by 
amino acid composition. Electrospray ionization mass 
spectrometry gave a mass of 739.3 Da and the predicted 
mass was 739.2 Da. Affinity-purified L-BEK from £. coli 
(10 mg) in 100 ml of 20 mM Tris-HCI (pH 7.5), 50 mM 
NaCl, was reacted on ice with 50 ml of 100 \xM VD4K- 
cm added dropwise over 60 minutes. The L-BEK-VD4K 
complex was dialyzed at 4°C against 20 mM Tris-HCI 
(pH 7.5), 50 mM NaCI, and concentrated to 25 mg/ml 
by ultrafiltration (Centricon-30, Ami con). The mass 
determined by electrospray ionization mass Sf)ectrometry 
(28^448 Da) was consistent with the mass calculated for 
the expected stoichiometric complex (28,442.3 Da). 



Crystallization of L-BEK and data collection 

Crystals of L-BEK-VD4K complex were grov^m at 20 '^C 
in a hanging drop against a reservoir of 100 mM sodium 
cacodylate (pH 5.0), 10 mM zinc sulfate, and 10% (w/v) 
PEG-400 at a protein concentration of 4 mg/ml. The 
crystals were orthorhombic (P2,2,2,) with one molecule 
per asymmetric unit and ceil dimensions of a = 39.99 A, 
b = 70.65 A, and c = 85.22 A. A crystal was transferred 
into cryoprotectant buffer containing 100 mM sodium 
cacodylate (pH 5.0), 20 mM zinc sulfate and 25% (w/v) 
PEG-400, and frozen at 100 K in a stream of nitrogen 
vapor. Data were collected using a Rigaku RaxisII image 
plate detector mounted on a Rigaku RU200 rotating cop- 
per anode. A data set complete to 2.3 A resolution was 
collected. Data were processed and scaled using the pro- 
grams DENZO and SCALEPACK (Otwinowski & Minor, 
1996). 



Structure determination and refinement 

Initial phases for the structure of L-BEK were obtained 
by molecular replacement, using the program AMoRe 
(Navaza, 1994) and the crystal structure of 7-chymotryp- 
sin (PDB entry code IGCD) (Hard ct at., 1991) as the 
search model. A strong unique solution was found, with 
correlation factors of 0.38 and 0.17 for the highest and 
second highest peak, respectively. Rigid body refinement 
followed by positional refinement using X-PLOR 
(Briinger, 1992) resulted in values for R and Kf^ee of 
43.0% and 49.2%, respectively. 

The rebuilding process, using the program O Qones & 
Thirup, 1986; Jones et al„ 1991), started by aligning the 
primary sequences of L-BEK and v-chyrriotrypsin. The 
model was modified by removing the diethyl phosphate 
inhibitor from the chymotrypsin structure, trimming 
loop regions of poor sequence conservation, and then by 
substituting the Y-<:l^y^o^ryps"^ residues either by 
alanine or by their proper counterparts in L-BEK, 
depending on the degree of sequence conservation. 
Further decreases in R and Rf^ee were achieved by using 
the structure of thrombin (PDB entry code IPPB) (Bode 
ct al., 1992) as a guide in regions where sequence cor\ser- 
vation with L-BEK suggested structural similarity, build- 
ing the C^" trace into 2Fo - maps. At this point the 
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value for K,^ dropped to 38.5%, and R decreased to 
33.5%. 

With the C" trace in place, the model was subjected to 
two rounds of rebuilding guided by simulated annealing 
omit maps (Hodel et at., 1992) in order to eliminate 
model bias of the initial search model with intermittent 
positional refinement, using the maximum likelihood 
target in the program CNSsolve QS (Briinger et at,, 1998), 
resulting in a value for of 33.5% that decreased to 
31.5% after individual B-factor refinement. A total of 45 
water molecules were added to the model and verified 
by inspection of the 2f ^ — electron density map. Two 
large spherical patches of electron density, clamped 
between acidic side-chains of symmetry-related 
molecules, were interpreted as Zn^"^, consistent with the 
presence of 20 mM zinc sulfate in the cryoprotectant sol- 
ution. Their incorporation into the model led to a small 
but significant decrease of both R and factors. The 
inhibitor Lys residue could be seen in 2Fj, — maps at 
an early stage of the building process, yet the remaining 
five residues were elusive until later in the refinement 
process. Eventually, residues Lys-Pl through Asp-P4 
could be built in an unequivocal manner into simulated 
annealing omit maps, with density missing for the two 
N-terminal amino acid residues of the inhibitor, Asp-P5 
and Val-P6. The final model comprises residues 1 
through 7 of the heavy chain, residues 16 through 243 of 
the serine protease domain of enteropeptidase, residues 
PI through P4 of the VD4K-cm inhibitor, two Zn^"^ and 
108 water molecules. The side-chaii\s of Lys3, Arg97 and 
Asn205 lacked electron density and were built as Ala. 
After bullc solvent correction and individual B-factor 
refinement, the model converged to R = 23.4 % and 
^^fr« = 26.9% for the resolution range 30-2.3 A, using a 
cut-off of f/a(F)> 2.0, with excellent stereochemistry 
emd B-factors appropriately restrained (Table 1). There 
are no residues in disallowed regior\s of the Ramachan- 
dran plot, and only two residues in generously allowed 
regions. 

Preparation of acetylated enteropeptidase light chain 

Purified L-BEK from baculovirus (5.5 jiM, 4 ml) in 
0.1 M sodium phosphate (pH 7.0), was stirred on ice 
with 6 \i\ acetic anhydride added in three portions. The 
reaction was maintained at pH 7.0 by the dropwise 
addition of sodium hydroxide. After one hour, the reac- 
tion was dialyzed against 20 mM Tris-HCI (pH 7.6), 
20 mM NaCl. 



Enzyme kinetics 

The concentration of each enteropeptidase was deter- 
mined by active-site titration with p-rutrophenyl 
p'-guanidinobenzoate (Chase & Shaw, 1970). Kinetic par- 
ameters for cleavage of Z-Lys-SBzl were obtained as 
described (Green & Shaw, 1979), Assays were performed 
at room temperature in 1 ml of 0.1 M Tris-HCI (pH 8,0), 
260 |iM DTNB, and 10 ^iM to 500 nM Z-Lys-SBzl. Reac- 
tion was initiated by adding enzyme (0.2 to 1.6 nM) and 
the rate of 3-carboxy-4-nitrophenoxide production was 
calculated from the absorbance at 412 nm, using an 
extinction coefficient of 13,600 M"^ cm"'. 

Kinetic parameters for the cleavage of the synthetic 
peptide substrate GD4K-na were determined as 
described (Grant & Hermon-Taylor, 1979; Lu et at., 1997). 
Values for and were obtained by directly fitting 
to the Michaelis-Menten equation by non-linear least 




squares regression. Under all assay conditions, the con- 
sumption of substrate (Z-Lys-SBzI or GD4K-na) wsis 
<15%of the total. 

Trypsinogen activation was assayed at pH 5.6 as 
described (Anderson et a/,, 1977; Lu et al., 1997). Assays 
(0.1 ml) contained 25 ^M trypsinogen^ 50 mM sodium 
citrate (pH 5.6) at room temperature. Reaction was 
initiated by addition of 2 nM enteropeptidase. After ten 
minutes, reaction was terminated by addition the of 2 \i\ 
of 2 M HCI. To quantify the trypsin product, an equal 
volume of 250 \xM 3-2765 in 20 mM Tris-HCl (pH 8.4), 
150 mM NaCl was added and absorbance at 405 nm 
recorded after five minutes. 

Changes in the free energy of transition state stabiliz- 
ation (AAGy) were calculated from the relationship 
A AGt = - KT In (/c„t/^m)muunt/(^«t/^)wud-type/ wherc 
R is the gas cortstant, T is the absolute temperature, k^^ 
is the turnover number, and is the Michaelis constant 
(Wilkinson et aL, 1983). 

Inhibition by VD4K-chloromethane 

Reactions were performed in 200 \x\ of 100 mM Tris- 
HCl (pH 8.0), VD^K-cm (2 nM to 2 ^M) and 2 nM enter- 
opeptidase at 22 'C. At selected time intervals, 30 \x\ 
samples were removed and added to 200 pi of 100 mM 
Tris-HCl (pH 8-0), 300 pM Z-Lys-SBzl, and 180 \xM 
DTNB to assay the remaining achve enteropeptidase. For 
each concentration of inhibitor, the pseudo first-order 
rate constant for inactivation, k', was determined from 
the relationship \n E ~ — k't + \n Eq, where E is the con- 
centration of active enzyme remaining at time (f), and Eq 
is the initial or total concentration of enzyme. 
The second -order rate constant for inactivation, fcz, and 
the dissociation constant for reversible inhibitor binding, 
K„ were determined from the relationship k' = k^U]/ 
(in + ^,), where [/] is the inhibitor concentration (Kitz & 
Wilson, 1962). Changes in the free energy of transition 
state stabilization (AAGt) were calculated from the 
relationship A AGr = - RT In (/Cj/J^i) mutant / (fc2/'^i) wild-type 
(WUkinson et al, 1983). 

Protein Data Bank accession number 

The coordinates have been deposited with the Protein 
Data Bank for immediate release under accession code 
lekb. 
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Enteropeptidase (EC 3.4.21.9) is a key enzyme in the 
intestinal digestion cascade responsible for the convert 
sion of trypsinogen to trypsin, which then activates vari- 
ous pancreatic zymo|fens. In order to structurally char- 
acterize the enzyme, we purified the enzyme from 
porcine duodenal mucosa and showed that it consists of 
three polypeptide chains, which we named "mini" chain 
(M chain), light chain (L chain), and heavy chain (H 
chain) in order of increasing molecular size. Based on 
their NH,-terminal sequences, a cDNA clone for porcine 
enteropeptidase was isolated and analyzed. The clone 
was 3597 base pairs long, which encoded 1034 amino 
acid residues of a single-chain precursor form of en- 
teropeptidase. The precursor contained an additional 
NH,-terminal 51-residue sequence including a putative 
internal signal sequence, followed by the M chain (66 
residues), the H chain (682 residues)^ and the L chain 
(235 residues) in that order. The H chain had regions 
partially homologous in sequence with low density li- 
poprotein receptor and complement components. On 
the other hand, the L. chain was highly homologous with 
the catalytic domains of trypsin-like serine proteinases. 
The structural model of the L chain suggests that the 
sequence, Arg*"-Arg-Arg-Iys"®^, is probably involved in 
the unique substrate specificity of the enzyme, prefer- 
ring acidic amino acid residues at the P,— sites. 



Enteropeptidase (enterokinase, EC 3.4.21.9) is well known 
and physiologically the only enzyme capable of converting tr\"p- 
sinogen to trypsin (1). TVypsin thus produced then converts 
various pancreatic zymogens including trypsinogen itself to 
their corresponding active enzymes. Therefore, enteropepti- 
dase has been recognized to play a key role in regulating in- 
testinal protein digestion. Indeed, patients with primar>* en- 
teropeptidase deficiency, a genetic disorder with no or little 
enteropeptidase activity in the duodenum, have been reported 
to suffer from malabsorption and malnutrition, particularly in 
infancy, and need to take drugs containing a pancreatic enzyme 
mixture for recovery (2). 

Because of its physiological importance, there have been a 
number of studies on the purification and characterization of 
enteropeptidase from various species (3-9). These studies have 
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shown that the enzyme is classified as a trypsin^like serine 
proteinase having strict specificity toward substrates with a 
basic amino acid residue at the site' and acidic residues at 
the Pj^Pft sites as expected from the NH^'terminal amino acid 
sequence (Val'-Asp-Asp-Asp-Asp-Lys*) of bovine trypsinogen. 
In contrast, structural information on the enzyme is still lim- 
ited. Its molecular weight thus far reported ranges from 
150,000 to 300,000, depending on the difference in species. In 
addition, the number of constituent polypeptide chains has 
been reported differently; the enzyme was reported to be com- 
posed of two chains in pig (4) and cow (7, 9) and three chains in 
human (10). Available data indicate that in all cases the 
smaller polypeptide chain, called the light chain, is a catalytic 
chain (4, 10, 11), but the precise chain composition is not yet as 
clear. This is largely due to lack of information on the complete 
amino acid sequence of enteropeptidase, although the bovine 
light chain sequence has been reported very recently by 
LaVallie et aL (12). 

We have recently established a purification procedure for 
enteropeptidase from porcine duodenal mucosa and found that, 
unlike the previous data (4), the enzyme consists of three dif- 
ferent polypeptide chains, i.e. **mini" (M),* light (L), and heavy 
(H) chains. Furthermore, we have clpned and analyzed a cDNA 
coding for the protein and deduced its complete amino acid 
sequence. The results clearly indicate that enteropeptidase is 
synthesized as a single-chain precursor protein and then is 
processed to the mature enz3rme. In this paper, we describe 
these results and discuss the substrate specificity of the en- 
zyme based on the three-dimensional structure constructed by 
computer modeling. 

MATERIALS AND METHODS 

Determination of Protein Coneentration-^Protein concentration was 
estimated colorimetrically by using a protein assay kit (Bio-Rad) and 
mouse IgO as the standard (13). 

Enzyme Purification — Enzyme actirity was assayed essentially ac- 
cording to Liepnieks and Light (7) with some modification. The purifi- 
cation procedure will be described in detail elsewhere. In brief, the 
mucosa was obtained from 40 porcine duodena by squeezing them with 
the fingers in 20 mu TVis-HCl (pH 6.0). and the crude extract was 
obtained from the mucosa by solubilizing with 1% sodium deoxycholato 
followed by centrifugation. The enzyme weis purified from the extract by 
four steps of chromatography on columns of DE52 (5.4 x 40 cm. What- 
man), Butyl Tbyopearl 650S (2 x 20 cm. prepacked. Tbsoh). Sephacryl 
5-300 (3.6 X 90 cm. Pharmacia Biotech Inc.), and benzamidine-Sepha- 
rose (0.9 x 25 cm. Pharmacia). The enzyme'fVactions obtained from the 
last column were pooled, concentrated, and used for further experi- 
ments. 



* The nomenclature is accorxling to Berger and Scbechter (60). 

' The abbreviations used are: M chain, 'mini* chain; L chain, light 
chain; H chain, heavy chain; LDL. low density lipoprotein; PAGE, poly- 
acrylamide gel electrophoresis. 
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Tabi^ I 

Purification of porcine duodenal enteropeptidase 
EKU is defined as nanomoles of trypsin produced in 30 min at 37 "C. 
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Fig. 1. SDS-PAGE patterns of the purified eazyme. a. under 
reducing conditions using a gradient gel of 4-20%; 6, under reducing 
conditiona using a gradient gel of 15-25%; c. under nonreducing condi- 
tions using a gradient gel of 15-25'*. Approximately 30 pg of the en- 
zyme was applied to each lane. 



Polyacrylamide Gel Electrophoresis — Polyacryl amide gel electro- 
phoresis (PAGE) was perforrned essentially according to Laemmli (14) 
using SDS-PAG plate 4/20 and Multigel 15/25 (Daiichi, Tbkyo). 

NH, -terminal Amino Acid Sequence Analysis — The purified enzyme 
sample was subjected to SDS-PAGE using 4--20 or 15-25% gradient 
gels, and the separated polypeptides were transferred to Immobilon P 
(Millipore) or Immobilon p^ (Millipore) essentially according to 
LeGendre and Matsudaira (15). The proteins on the membranes were 
analyzed with an automated protein sequenator (model 477A, Applied 
Biosystems) on-line to a phenylthiohydantoin-derivative analyzer 
(model 120A. Applied Biosystems). 

cDNA Cloning and Analyses — The total RNA was extracted from 
freshly resected porcine duodenal mucosa by the guanidium isothiocya- 
nate method and purified by CsCl density gradient ultracentrifugation 
( 16). The poly<A) RNA was isolated using Oligotex dT-30 super (Takara). 
Complementary double-stranded DNA was synthesised using a cDNA 
synthesis system plus (Amersham Corp.* from 5 pg of the poly<A) RNA 
as a template with oligo(dT} or random hexanucleotide as a pnmer(17). 
The cDNA libraries were constructed using a cDNA cloning system 
(Amersham Corp.). except that AZAP II/fcoRI vector (Stratagene) was 
used. A 53-mer oligonucleotide described under Tlesults" was synthe- 
sized by Sawaday Ibchnology (Tokyo). The probe was labeled at the 
6'-end using (t^"P1ATP (6000 Ci/mmol. Amersham Corp.) and a Mega- 
label labeling kit (Amersham Corp J. The DNA fragment probe was 
labeled by the mulUprime method using (a-'*PldCTP (3000 Ci/mmol, 
Amersham Corp.) and a Megaprime labeling kit (Amersham Corp.). The 
transfer membrane used was Hybond N (Amersham Corp.). and the 
conditions of transfer, fixation, prehybridization, hybridization, and 
wash were essentially according to the manufacturer. For the 53-mer 
oligonucleotide probe, 45 *C was adopted as the temperature of prehy- 
bridization and hybridization, and 2 x SSC and 0.19b SDS at 60 **C as 
the stringent wash conditions. The cloned cDNA in the vector was 
automatically subcloned'to pBluescript phagemid. and double-stranded 
DNA in the phagemid was used as a template for DNA sequencing. DNA 
sequencing .was perfonsed by the dideoxy chain termination method 
(18) using a Thq dye primer sequencing kit (Applied Biosystems), a 
thermal cyder (model PJ 480, Perkin-Elmer). and a DNA sequenator 
(model 370A, Applied Biosystems). 

Computer Modeling of Three-dimensional Structure of L Chain — A 
homology search for the L chain was -performed in the Brookhaven 
Protein Data Bank by the multiple alignment system for protein se* 
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Fig. 2, Restriction enzyme mapping of the cDNA clones. The 
base pair numbers are according to the numbering of the longest clone. 
EK-2. EKR-1 and -2 were positive clones in the random-primed cDNA 
library, while EK-2. -3, -7, and -11 were positive in the oligo(dT)-primed 
library. All clones had the same map except for an EcoRl site in EK-7. 

quences (62). Comparing the sequences of the 28 most homologous 
proteins of known three-dimensional structure with that of the porcine 
L chain, the L chain was divided into 13 parts so that each segment had 
a similar deletion and insertion profile. For each segment, one protein 
was selected from the homology list so as to minimize insertion and 
deletion and to maximize identity. Thus, a chimeric reference protein 
was constructed that was composed of the following segments: IHNE 
(human neutrophil elastase) for positions 800-^14, 815--825. and 839- 
856; IDWB (human thrombin) for 826-838 and 869-892; 3RP2 (A chain, 
rat mast cell protease II) for 857-868; 4CHA (A chain, bovine o-chyroo- 
trypsin) for 893-930, 988-1003, and 1018-1034; 3EST (porcine pancre- 
atic elastase) for 931-944; ISGT iStreptomyces griseus trypsin) for 945- 
971; and ITLD (bovine ^-trypsin) for 972-987 and 1004-1017. Gly*^ 
and Arg*** were inserted into the reference protein IHNE by using the 
coordinates of the main chain of Gln-Arg of Leu*^-Tyr-Gln-Gln-Arg- 
Asp-Val-Asn^of 6TIM (triose-pKbsphate isomerase). The three-dimen- 
sional modeling of the L chain was performed using the chimeric protein 
as a reference protein according to Kcuihara et al. (19). Modeling of the 
complex of the L chain and Val-(Asp)4-Lfys was also performed with the 
above structural model as a base protein using the coordinates of the 
main chain of Lys^-Pro-Ala-Cys-Thr-Leu" of the inhibitor part in 3SGB 
in protein data bank code (proteinase B firom S. griseus complexed with 
the third chain of turkey ovomucoid inhibitor) for the initial arrange- 
ment of the hexapeptide. essentially according to the same method. 

RESULTS 

Purification and Structural Characterization of Porcine 
Enteropeptidase-^From 40 porcine duodena, 0.42 mg of the 
purified enzyme was obtained in a 6.4% yield with 729-fold 
purification (Table I). The molectilar weight of the enzyme was 
estimated to be approximately 200,000 by gel filtration (data 
not shown). As shown in Fig. la, SOS-PAGE using a gradient 
gel (4—20%) imder reducing conditions gave two polypeptide . 
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Structure of Porcine Enteropeptidase 



Fio. 3. The nucleotide 'and the de- 
duced amino acid sequences of the 
cDMA clone EK-2. The boxed amino acid 
sequence is the hydrophobic segment pre- 
sumed to be an "internal signal se- 
quence." Underlines with (a), (6), and (c) 
indicate sequences that agreed with the 
NHs-terminal amino acid sequences de- 
termined for the M, H, and L chains of the 
mature enzyme, respectively. The under* 
line at base pair numbers 3559-3564 in- 
dicates a polyadenylation signal. The resi- 
dues in white letters are potential 
asparagine-1 inked glycosylation sites. 
The double underlines indicate Ser/Thr 
clusters as potential mucin-type glycosy- 
lation sites- Residues with below in- 
dicate the enteropeptidase catalytic triad. 
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bands with ^ 152.000 (H chain) and =» 48,000 (L chain). 
In addition, a cluster of bands was reproducibly observed near 
the dye front, which we named "mini" chain (M chain). The M 
chain was shown to be composed of five or more separate 
polypeptides with Af, = 16.000-19,000 when analyzed using a 
15-25^f gradient gel as shown in Fig. 16. Upon SDS-PAGE 
under nonredudng conditions* the purified enzyme produced 
the M chain bands and a polypeptide band with M, = 200,000 
(Fig- Ic). Therefore, we concluded that the purified porcine 
mature enzyme is composed of three different polypeptides, the 
H. L, and M chains. The former two chains are associated 
covalently with each other, while the M chain is bound to the H 
and/or L chain non-covalently. 

The r^fHg-tenninal amino acid sequences of the H and L 
chains of the enzyme were shown to be SVTVTFDLi-FAQWVS- 
DENIKEEUQGIEA (29 residues) and IVGGXDSREGAXPXV- 
VAXrYYNGQLLXGASLV (31 residues), respectively. For the M 
chain, the analyses of the three bands electrophoretically sepa- 
rated on SDS-PAGE resulted in the same sequence of LGKS- 
HEARGTMKTTXGVTYNPNL (23 residues). The molar ratio of 
-the H, L, and M .chains in the enzyme estimated from the 
amounts of phenylthiohydantoin-derivatives obtained by NH^- 



Table II 

Comparison of the molecular weight of each chain calculated from 
the deduced amino acid sequence with that measured by SDS-PACE 
and the numbers of potential asparagine-linked glycosylation sites 
The molecular weight was calculated assuming that no more process- 
ing occurs in the COOH-terminal region of each chain. 

Molecular weight. kIC Number of poteaUal 

Calculated ^y^^^Si^n trites 



M chain 


7.6 


16-19 


1 




H chain 


75.4 


152 


17 




L chain 


26.4 


48 


4 





terminal sequencing was approximately 1:0.6:0.7 on average. 
CTonsidering the variations in the jaeld of each chain and phen- 
yl thiohydantoin-derivatives in the analytical procedures, this 
is taken to in<licate that the three chains are associated in an 
equimolar amount to form the enzyme. 

Isolation and Characterization of Porcine Enteropeptidase 
cDNA Clones- — Based on part of the NH^-terminal sequence of 
the H rhnin (Phe^° to lie"), we designed a 53-mer oligonucle- 
otide probe including 16 inosines, 6-fold redundant and comple- 
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Enteropeptidase (porcine) 
Enteropeptidase (bovine) 
Hcpsin (hman) 
Plasna Kellikrein (huaan) 
Factor XI (Hnan) 
Trjrptasa (dog) 
Trypsin (bovine) 
Chytaotrypsin (bovine) 
Elastase (porcine) 
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Enteropeptidase (porcine) 
Enteropeptidase (bovine) 
Hepsin (hinan) 
Plasma Kallikrein (husan) 
Factor XI (hunan) 
Tryptase (dog) 
Trypsin (bovine) 
Chymotrypsin (bovine) 
Elastase (porcine) 
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(Identity, X) 

(89.8) 
(44.7) 
(40.4) 
(39.1) 
(39.1) 
{3S.3) 
(34.1) 
(30.6) 



Fio. 4. Comparison of the amino acid sequence of the cataljrtic chain of enteropeptidase with those of other serine proteinases. 
The catalytic chain sequence of porcine enteropeptidase is compared with those of bovine enteropeptidase (12), human hepsin (21), human plasma 
kallikrein (22), human factor XIa (45). dog tryptase (46), bovine trypsin (47). bovine chymotrypsin (48-51). and porcine elastase (52). Residues are 
expressed in one-letter code. indicates the same residue with porcine enteropeptidase; indicates deletion inserted to optimize the homology. 
Residues in white letters are the conserved catalytic triad. His, Asp. and Ser. The percentages of identity with porcine enteropeptidase are listed 
at the ends of the sequences. 



mentary to the coding chain: 5'-ATICCITGIATIA(A/G)ITCIT- 
CITnATITTITCITCI(C/G)(T/A)IACCCAITGIGCIAA-3'. First, 
we screened the random-pnmed cDNA library using the oligo- 
nucleotide probe. Of about 5 x 10^ independent clones, two 
positive clones (EKRrl and -2) were isolated. Next, using the 
insert DNA of EKR-1 as a probe, we screened the oligo(dT>- 
primed cDNA library, whose cDNA was size* fraction a ted to be 
larger than approximately 1,6 kilobase pairs. Of 5 x 10' inde- 
pendent clones, 11 clones giving positive signals were isolated. 
7 of which were later found to be fused with other cDNAs for 
unknown reasons and were excluded. The resxilts of restriction 
enzyme mapping and DNA sequencing of both ends of the re- 
maining four clones named EK-2, -3. -7. and -11 and EKR-1 and 
-2 are presented in Fig. 2. The six clones had essentially the 
same restriction enzyme map except for an EcoBJ site in EK-7. 
EK-2 was judged to be the longest clone and was used for 
Airther sequencing. 

Nucleotide and Deduced Amino Acid Sequences of cDNA 
Clone EK-2 — The nucleotide and the deduced amino acid se- 
quences of EK-2 are shown La Fhg. 3. The cDNA clone was 3597 
base pairs long. It had a polyadenylation signal at the 3559 
base pair position and poly<A) at the 3 '-end. The first ATG met 
the criteria for an initiator codon in eukaryotes (20). Assuming 
this codon to be the initiator, the open reading frame was 3102 
base pairs long, and thus the deduced amino acid sequence was 
composed of 1034 residues. The boxed sequence from positions 
19 to 43 was the most hydrophobic domain in the sequence. The 
NHj-terminal sequences of the M, H. and L chains were de- 
duced to start at positions 52. 118. and 800. respectively. Thus, 
the enzyme is thought to be originally synthesized as a single- 
chain precursor (Af, s 114.763). Assuming that no more proc- 
essing occurs in the COOH-terminaJ region of each chain, the 



M. H. and L chains contain 66. 682. and 235 amino acid resi- 
dues, respectively. The molecular weight of each chain calcu- 
lated from the deduced amino acid sequence was much smaller 
than that determined by SDS-PAGE (Table II), probably due to 
the presence of oligosaccharide chain (s). 

A homology search for the deduced amino acid sequence by 
the FASTA program in the PIR protein data base revealed that 
the catalytic (L> chain is homologous with those of trypsin- and 
chymotrypsin-like serine proteinases (Fig. 4). Human hepsin 
(21) and plasma kallikrein (22) showed over 40% identity. The 
bovine enzyme (12) was 89.8% identical with the porcine en- 
zyme. On the other hand, the H chain had interesting homolo- 
gies in limited regions of certain proteins. The sequences at 
positions 195-236 and 654-692. homologous with each other, 
were homologous with those in complement C9 (23), low den- 
sity lipoprotein (LDL) receptor (24), etc. (Fig. 5a). The se- 
quences at positions 240-353 and 539—653 are also homologous 
with each other and were homologous with those in dorsal - 
ventral patterning protein (25), complements Clr (26) and Cls 
(27), etc. (Fig. 56). The sequence at positions 772-788 was 
homologous with those in factor X (28). protein C (29), hepsin 
(21). etc. (Fig. 5c). 

Threc'dimensional Structure of L Chain of Porcine Entero* 
■peptidase as Deduced by Computer Aiodelinff^Three-dimen- 
sional structural modeling of the complex of the catalytic chain 
and the NH^ terminus of bovine trypsinogen, Val'-Asp-Asp- 
Asp-Asp-Lys®, was performed tising the chimeric reference pro- 
tein, which was 38.79b identical with the L chain with a 2-rest- 
due insertion in the fourth segment: The resulting model' is 
shown in Fig. 6a. The mode of binding of the NH,-terminal 

* The coordinate data of the model may be presented on request. 
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Fic. 5. Comparison of partial sequences of the H chain with those of homologous regions in other proteins, o, the cysteine-rich 
sequence repeats are compared with the consensus sequences of human LDL receptor (24); human terminal complement components C7 (53), C8a 
(54), cap (55), and C9 (23); human LDL receptor-related protein (56); human perlecan (57); and rat GP-330 (58). The residues identical in at least 
six sequences are boxed. 6, Clr/s type sequences are compared with the consensus sequence (25) among the sequences of human complenient 
components Clr (26) and Cls (27)» Drosophila dorsal-ventral patterning protein (OVPP (25)). and bone morphoffenetic protein-l iBMP-t (59)). The 
residues identical between the enteropeptidase sequences and the consensus sequence are boxed, c, the sequence near the carboxyl-terminal end 
of the H chain is compared with those of the corresponding regions of human hepsin (21), human factor X (42), and human protein C (29). The 
residues identical in the four sequences are boxed. In o, 6, and the values in parentheses indicate residue numbers; a deletion mserted to 
optimize the homology; a non-consensus residue. 



hexapeptide of bovine trypsinogen with the active site region of 
the catalytic chain is also shown (Fig. 66). 

DISCUSSION 

The mature three-chain enzyme is thought to be generated 
by peptide bond cleavages from the single-chain precursor in 
which the three chains are aligned in the order M, H, and L 
chains, starting from the NHg terminus. Previously, the porcine 
enzyme was reported to be composed of two chains, an H chain 
(Af, = 134.000) and an L chain (Af^ = 62,000) (4). On the other 
hand, the human enzyme was reported to be a three-chain 
enzyme (10). Two of the human chains have molecular weights • . 
of 140,000 and 54,000, comparable with those of the H and L 
chains of the porcine enzyme, respectively, but the third 
polypeptide (M, s 120,000) of the human enzyme is much larger 
than the porcine M chain (Af, o 16,000-19,000). Thus, the M 
chain appears to be a newly identified component of the en- 
zyme, although it is not clear at present whether the M chain 
is essential for the function of enteropeptidase. 

The predicted amino acid sequence of the porcine enteropep- 
tidase precursor contained a 51-residue peptide sequence, 
which is missing in the purified mature enzyme. This peptide 
contains a very hydrophobic segment (from Val^° to Ue*^) long 
enough to span the membranes. Since the precursor protein 
does not appear to have any other membrane-spanning seg- 
ment or typical signal sequence, this hydrophobic segment pre- 
siunably serves as an internal signal sequence (30-32) and 
keeps Uie enzyme bound to membranes. Enteropeptidase is 
localized to the brush border membranes of the duodenum and 
upper intestine (33, 34) in such a manner that its catalytic 



domain can freely contact extracellular trypsinogen. Therefore, 
the NHj-terminal region should reside on the cytoplasmic side 
and the COOH-terminal region on the outside of the cell. Thus, 
enteropeptidase is apparently a TVpe II* integral membrane 
protein. The NHa-terminal positively charged residue<s) flank- 
ing the internal signal sequence is known to be an important 
part of a dominantly acting retention signal to create the TVpe 
II orientation (35). The NHj-terminal Sl-residue peptide ap- 
parently meets the above structural requirements. 

As schematically shown in Fig. 7, the purified porcine en- 
zyme obviously resulted from proteolytic cleavages at three 
sites. Cleavage at Ala^*-Leu" produces the enzyme dissociated 
from the membranes. Inter^tingly, Tbyoda et al. (36) reported 
that elastase could release enteropeptidase activity from the 
brush border membranes. The peptide bond cleavage at Ala**- 
Leu" is compatible with the substrate specificity of elastase. 
Therefore, elastase may be responsible for the cleavage. In 
addition, other proteinases cleaving Gly**"-Ser»®® and Lys^"- 
Qe^ must be present, although no information about them is 
available at present. 

The H chain has a Ser/rhr*rich sequence at positions 172- 
187, comprising 12 residues of Ser/Thr. Such Ser/Phr-rich re- 
gions, which have been foimd in glycophorin A (37), LDL recep- 
tor (38), sucrase-isomaltase (39), aminopeptidase N (40), ete.. 
are documented to be potential O-linked glycosylation sites. 
Indeed, polyclonal antibodies against human enteropeptidase 
were reported to cross-react with type A blood antigen (10), 
indicating the presence of O-linked oligosaccharide(s) in the 



* The nomenclature is according to von Heune and Gavel (61).* 
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Fic. 6. The three-aimensional structure of the L chain of por- 
cine enteropeptidase constructed by computer modeling, o, the 

tube model of the main chain. Segments in the reference chimera pro- 
tern derived from 3RP2. ITLD. IDWB. 4CHA. ISGT, IHNE. and 3EST 
^^T^^It^ '"^J^^lSyen, yellow, blue, magenta, cyan, and white, respec- 
tively. The side chains in the basic amino acid cluster, Arg»"-Arg-Ar«- 
Lys . are shown with the Corey.Pauling-KoItun models colored in 
yellow^ and those of the active site His»« and Ser»« in cyan and green 
respectively. The ribbon model colored in red shows the main chain of 
part of the substrate, VaJ-Asp-Asp-Asp-Asp-Lys. b, the stick model of 
the enzyme interactmg with part of the substrate, Val-Asp-Asp-Aso- 
Asp-Lys. -nie substrate part is shown with the red stick model. The side 
chains of the catalytic triad of Asp"'. Hifi»«, and Ser*" of the enzyme 
are shown with the yellow stick model, and those of the amino acid 
residues of the en^rmeinteracUng with the substrate are shown with 
i^^i''"!,?^,"*^**^! TTie calculated distances for the two hydrogen 
?^Tk,2*lTC.^P ^ His•«N--Ser-«0^ and the io^c pS^ 

Arg"^'»-GIu*«0«», are 2.74, 3.08, and 2.65 A, respectively. Those fo^ 
IT l?22J'^"J^f.^®®U?® enzyme and the substrate trypsinogen 
(Arg^^N-^-AspW. Arg"«N^».Asp»0*», and Iors«»N«-Asp»0^and Se 
SllSSS? ^.'"yS?, enzyme and substrate main chain atoms 

(Ty^odiN.Asp«0. Gly'^'T^.Asp^O and GIy«*N-l^-0) are 2.66, 2.76, and 
2.51 A and 2.76. 2.77. and 2.69 A. respectively. 

enzyme. l*hus, the Ser/Thr-rich segment in the H r-h^ir. is pre-, 
aumably the region of O-linked carbohydrate attachment. In 
addition, 22 potential ^-linked glycosylation sites are seen in 
the enzyme, in accord with the previous findings that the en- 
zyme is heavUy glycosylated (4, 6, 7). From the present study. 
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the carbohydrate content of porcine enteropeptidase is esti- 
mated to be as much as 50% of the total weight. 

Two sets of repeating sequences are present in the H chain 
We found two tandem repeats of 38 amino acids (about 30% 
idenUty) including 6 conserved cysteine residues (Fig 5o) Al- 
though the locations of the disulfide bonds in enteropeptidase 
have not been determined, these 6 cysteine residues are likely 
to form three intrachain disulfide bonds within each of the two 
repeats. They are homologous vrith certain regions in some 
terminal complement components such as C9 (23), LDL recep- 
tor (24), etc. The homologous seven repeating sequences in LDL 
receptor are thought to be the sites for interaction with apoli- 
poproteins (38). Besides, polymeric complement C9 has re- 
cently been reported to have affinity with apolipoproteins (41). 
By analogy, the cysteine-containing repeats in enteropeptidase 
may also be the sites of interaction with other proteins such as 
apolipoproteins. As shown in Fig. 66, the H chiin contains 
another two segments with internal homology (about 25% iden- 
tity), resembling partial sequences of complement components 
Clr(26) and Cls (27), etc. At present, the role of this Clr/s-type 
region in the enteropeptidase H chain is not known. In addi- 
tion, a region near the COOH-terminal end of the H chain 
shows low but detectable sequence homology with the corre- 
sponding regions of the non-catalytic chains of some other ser- 
ine proteinases (Fig. 5c). In protein C (29) and factor X (42). 
proteolytic cleavages in the activation process are known to 
occur at mono- or dibasic sites between these regions and the 
NHj termini of the catalytic chains. By analogy, the enteropep- 
tidase precursor may be cleaved at the dibasic site Lys'^^-Lys^^ 
at first and then activated by the cleavage at the NH^ terminus 
of the L chain. 

On the other hand, the L chain is highly homologous with the 
catalytic chains of other serine proteinases (Fig. 4). The three- 
dimensional structural model of the L chain indicates that the 
catalytic triad, His^o. Asp'"'*, and Ser««. and the S^* pocket are 
situated essentially in the same manner as in trypsin (43) 
Moreover, in the S, pocket, Asp^" positioned at its bottom and 
Gly and Gly*'**' at its neck are also conserved in enteropep- 
tidase, indicating that it is a typical trypsin-like serine protein- 
ase. Since enteropeptidase has a strict specificity toward sub- 
strates with acidic amino acid residues at the P2-P5 sites, the 
presence of additional sites iS^S^) for substrate side chain 
binding has been postulated (3, 8). Lysine residue(s) has been 
suggested to be important to the substrate specificity of porcine 
enteropeptidase by a chemical modification study (44), Accord- 
ing to the present structural model of the porcine L chain in-, 
eluding the NHa-terminal hexapeptide (Val'-Asp-Asp-Asp-Asp- 
Lys®) of bovine trypsinogen (Fig. 66). the basic cluster sequence, 
Arg«**-Arg-Arg-Lys«". unique to enteropeptidase among the 
family of serine proteinases (Fig. 4). appears to make a turn 
structure ac^acent to the S, pocket and interact with Asp^-Asp- 
Asp-Asp* of trypsinogen through three strong salt bridges- 
Arg*o» versus Asp^ Arg^^ versus Asp», and Lys*" versus Asp^ 
This is consistent with the previous results indicating that an 
acidic amino acid at the site in the substrate is essential and 
that those at the Pa-Pa sites are beneficial for the cleavage (3, 
8). In the bovine L chain, the residue corresponding to Arg^ is 
substituted with Lys (12), but the substitution does not seem to 
cause any significant effect on the interaction with the sub- 
strates. Moreover. Arg«" makes an ion pair with Glu^^a. The 
carboxyl group of Asp^ of the peptide does not interact with the 
enzyme in this model but may form an ion pair with the side 
chain of Lys"* of bovine trypsinogen as judged fi^m a three- 
dimensional structure model (data not shown). Further, the 
main chain atoms. Asp*0. Asp^O, and Lys^O of the peptide 
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Pic. 7. The gross structure of the precursor form of porcine enteropeptidase, the sites of proteolytic processingp and potential 
asp aragine- linked glycosylation sites. /S5» putative internal signal sequence; S/T, Ser/Thr-rich sequence; C9-a and -6, repeating sequences 
homologous with part of the sequences of complement C9/LDL receptor; Ci>o and -6, repeating sequences homologous with part of the sequences 
ofcomplement Clr/s; CNCC, sequence near the COOH-terminal region of the H chain homologous with those of the noncatalytic chains of two-chain 
serine proteinases such as factor X and protein C; AP. putative activation peptide; Cat^ catalytic domain. Closed circles indicate potential 
asparagine-Hnked glycosylation sites. Vertical arrows indicate proteolytic processing sites. 



substrate form three hydrogen bonds with the atoms, Tyr*°**^N, 
GIy**"N, and Gly®**N of the enzyme, respectively. Thus, the 
unique substrate speciHcity of enteropeptidase can be ex- 
plained clearly. 
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With roots in ancient formulations, methods for the 
chemical derivatization of proteins continue to expand 
and develop. The creation of this new journal dealing 
exclusively with bioconjugate chemistry was barely con- 
ceivable just a few years ago. An explosion of interest 
in the subject during the last decade is, however, easily 
seen. The tremendous growth in both the number of pub- 
lications and in the number of research groups involved 
in these kinds of studies has been promoted by both prac- 
tical interests related, for example, in some cases to pos- 
sible pharmacological or medical diagnostic applications 
and by interest in questions of fundamental biochemi- 
cal structure and function. 

Greatly improved understanding of established reagents 
and procedures and the development of many new, and 
more sophisticated, reagents and procedures have been 
facilitated by advances in the ancillary fields of organic 
chemistry. X-ray crystallography, and molecular biol- 
ogy. Whereas protein modification in the past often 
involved the same reagents and reactions commonly used 
in the organic chemistry of that time (i.e., acetylation, 
iodination, deamination, reaction with formaldehyde, etc.), 
those in most common use today have, by and large, been 
developed to meet the varied but relatively specific needs 
of the protein chemist. A large number of specialized 
reagents have been described: affinity labels, photoaffin- 
ity labels and other specifically designed site-directed 
reagents (i, 2), group-selective reagents which react exclu- 
sively (or at least predominantly) with one particular type 
of amino acid side chain (see below, especially Table II), 
and others that^react relatively nonspecifically with a num- 
ber of different side chains (5). 

Reagents have been designed to preserve electrostatic 
charge (4, 5), to alter electrostatic charge (6), and to increase 
hydrophobicity (7, S). Reagents and procedures have been 
developed to decrease immunogenicity (S, 70), to increase 
and decrease susceptibility to proteolysis (11-13), to 
increase UV or visible absorbancy (14), to introduce flu- 
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orescent labels (i5, 16), spin labels (17), radiolabels (18- 
20), various metal ions (21), magnetic microspheres (22, 
23), and electron-dense substituents (24), to increase the 
content of certain low-abundance nonradioactive iso- 
topes (25), and to attach several different types of car- 
bohydrate moieties (26-29), biotin (50), and a number of 
other biospecific recognition groups (i.e., avidin, strepta- 
vidin, antibodies, protein A, protein G, lectins, and oth- 
ers (31)). Procedures also have been developed to effect 
the cleavage of peptide chains (32, 33); to modify enzyme 
specificity (34); to modify the terminal hydroxyls of galac- 
tosyl residues in glycoproteins (35); to introduce intramo- 
lecular and intermolecular cross-links, both to couple 
already associated species (36, 37); and to join various 
proteins, which might or might not otherwise associate, 
in order to combine the properties of both into a single 
molecule, e.g., to make protein-protein conjugates (38, 
39), enzyme-linked antibodies (40, 41), immunotoxins (42, 
43), .Bind drug-protein conjugates (44), A large number 
of reagents that have been developed to serve these and 
a variety of other purposes are commercially available. 

EARLY DEVELOPMENTS 

The chemistry of proteins had its origin in the chem- 
istry of the amino acids and only later concerned the amino 
acid side chains of intact proteins. For practical pur- 
poses, a variety of procedures for protein modification 
had been developed and used many years, prior to any 
significant interest in or understanding of protein chem- 
istry; For example, the use of formaldehyde and other 
agents in the tanning industry was apparently formu- 
lated en tirely on the basis of empirical observations, with- 
out any real understanding of the reactions or of the chem- 
ical nature of the materials involved. Similar proce- 
dures were also employed successfully to convert a number 
of protein toxins, usually of bacterial origin, into tox- 
oids, which retain some of the original antigenic deter- 
minants but are no longer toxic. Inoculations of toxoids 
are still widely employed to confer immunity against a 
number of serious bacterial diseases. Although still widely 
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used, there is not much known about the manner by which 
formaldehyde converts toxins into toxoids. 

Interest in quantitative determinations of proteins and 
their various constituent amino acids was a major impe- 
tus for many early studies of chemical modification. While 
a significant number of proteins had been crystallized 
by the 1920s, analytical values for individual amino acids 
were still quite poor well into the 1940s. Analytical data 
had, for example, revealed only one sulfur- containing amino 
acid, cystine, in naturally occurring proteins prior to the 
discovery of methionine in 1922. Threonine was not dis- 
covered until 3 years later. 

Most of the procedures available at that time for the 
determination of individual amino acids were, of course, 
supplanted by the development of the far more conve- 
nient cation- exchanger amino acid analyzer in the 1950s. 
Slightly altered forms of some of those procedures, how- 
ever, still find use today. Variations of the Van Slyke 
procedure for determining protein nitrogen,' for exam- 
ple, are still sometimes useful for bringing about the selec- 
tive deamination of proteins. Sodium nitroprusside, which 
was once used for spectrophotometric determinations of 
cysteine, also appears to be useful for the selective mod- 
ification of protein thiol groups. Some much more recently 
developed procedures for protein modification, on the other 
hand, have been shown to be useful for analytical deter- 
minations of certain amino acids in proteins. The use of 
water-soluble carbodiimides and certain nucleophiles to 
determine amounts of glutamine and asparagine, and of 
2-hydroxy-5-nitrobenzyl bromide to determine tryp- 
tophan contents of proteins are possibly of special inter- 
est since the acid lability of those amino acids makes 
their determinations difficult by conventional amino acid . 
analysis (45, 46). The use of TNBS^ for the determina- 
tion of amino groups (47) and DTNB for the determina- 
tion of thiol groups {48) in intact proteins have also 
achieved special status as a result of their widespread 
use for such purposes. 

By the end of World War II, interest had turned to 
determining particular amino acid residues necessary for 
the biological activities of proteins. That a particular 
amino acid residue in the active site of an enzyme might 
be identified on the basis of its reaction with selective 
chemical reagents was an idea developed during this penod. 
Those interests and further careful scrutiny of the avail- 
able methodology led to the publication of two impor- 
tant reviews of protein modification in 1947 (49, 60), The 
report of Balls and Jansen (51) showing that the inacti- 
vation of several proteases by diisopropyl fluorophos- 
phate resulted from its reaction with a specific serine res- 
idue in each case was another milestone of this period. 

Some of the earliest attempts to use chemical modifi- 
cation procedures to identify particular amino acid resi- 
dues required for the biological activity of a protein were 
conducted in the laboratory of Heinz Fraenkel-Conrat 
(52-54), A few of those procedures are still used, with 
little change, to this day. However, theise earlier studies 
were seriously hampered by the absence of sensitive and 
accurate procedures to determine the number and type(s) 
of amino acid residues undergoing modification and by 
the absence of effective micro and semimicro procedures 
to separate, purify, and characterize products. The stud- 
ies of that period, nevertheless, provided important descrip- 
tions of procedures for use by other investigators and 
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* Abbreviations are as follows: trinitrobenzeneaulfonic acid, 
TNBS; 5,5'-dithiobi8(2-nitrobenzoic acid), DTNB; toaylphen- 
ylalanine chloromethyl ketone, TPCK; dithiothreitol, DTT; 1- 
ethyl-3-l3-(dimethylamino)propyllcarbodiimide, EDC 



served as important steps 
improved procedures. 

Quantitative data on the extent of modification became 
more attainable with the increased availability of radio- 
actively labeled reagents during the 1960s. Greater access 
to automated amino acid analyzers (55) and the devel- 
opment of effective ion-exchange and gel exclusion chro- 
matography media at about the same time also facili- 
tated the characterization of modified proteins, which 
led to a better understanding of many niodification 
reagents and procedures. Various forms of micro gel elec- 
trophoresis also became commonplace in the same decade, 
and these greatly enhanced the ability to monitoi the 
effects of modification on relatively small amounts of pro- 
tein. The advent of an effective procedure for the rou- 
tine determination of amino acid sequences, first described 
by Edman in 1956 (56), was also a major milestone. 
Although often considered routine today, these proce- 
dures were developed only after many years of effort and 
were essential for the characterization of various modi- 
fication procedures. 

SITE-SPECIFIC MODIFICATIONS 

In 1962, Wofsey and co-workers (57) described a selec- 
tive reaction of the p-arsonylbenzenediazonium ion with 
the antigen-combining site of a rabbit anti-p-azoben- 
zenearsonate antibody. This demonstration of affinity 
labeling was followed in about 1 year by the description 
of a highly selective reaction between chymotrypsin and 
a reactive substratelike compound, TPCK (58). The lat- 
ter was shown to effect the modification of a particular 
histidine residue of chymotrypsin with the complete elim- 
ination of its catalytic activity. The selectivity of these 
and other affinity labels results from their resemblance 
to a substrate or ligand. Their strong affinity for a par- 
ticular site concentrates a reactive group, like the chlo- 
romethyl ketone moiety of TPCK, at a specific site, where 
its reaction with a nearby amino acid side chain is pro- 
moted by mutual proximity. Subsequent to these reports, 
a very large number of affinity labeling reagents have 
been described. Affinity labeling is now one of the most 
important methods for identifying amino acid residues 
in enzyme active sites. Table I describes some of the 
most commonly used types of affinity labeling reagents 
and summarizes a few of their salient properties, 

SIDE CHAIN SELECTIVE MODIFICATIONS 

The use of the side chain selective reagents (i.e., those 
which react, under certain specified conditions, with a 
single or, at least, a limited number of side-chain groups 
in a fairly predictable manner) is, however, a simpler 
approach. At least for initial screening, it is stijl widely 
used to identify amino acid chains required for biologi- 
cal activity. Table II contains a list of some of the most 
commonly used and, in the authors' opinions, most use- 
ful group-selective reagents and brief descriptions of some 
of their important properties and applications. 

The retention of biological activity after treatment with 
one of those reagents is usually good a priori evidence 
that the modified amino acid side chains are not required 
for that particular activity. Under appropriate condi- 
tions, each reagent normally reacts only with the indi- 
cated target side chain (s). Depending on the protein, the 
reagent, and the particular conditions, however, com- 
plete modification of all such side chains is not always 
obtained. In most cases, the extent of reaction can be 
determined by either direct spectrophotometric measure- 
ments, amino acid analyses, or the use of radioactive 
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type 



examples 



target enzymes 



o*halocarbonyl 
RCOCHjX 



epoxide 

n-CH-CH, 

sulfonyl fluoride 

aldehyde 
RCH— O 



azido 

(photoaffinity 
labels) 



TPCK 

3- bromo-2- ketogJu tarate 
chloroacetol sulfate 
1 ,2-anhydromannitol 

6*phosphate 
glycidol phosphate 

5'* I (fluorosuUbnyl)benzoyl] 

•adenosine 
2'^-dialdehydo-ATP 



pyridoxal phosphate 



8-azido-ATP 



5-a2ido-UDP 



chymotrypsin 



isocitrate dehydrogenase 
triose phosphate isomerase 
glucose 6-phosphate 
isomerase 

triose phosphate 

isomerase* enolase 
glutamine synthetase, etc 

pyruvate carboxylase 
adenylate cyclase, etc 



glycogen phosphor>iase, 

gutomine synthetase, 
NA polymerase, etc 
Fl-ATPase 



UDP-glucose, 

pyiophosphorylase 



reaction characteristics 



addition to nudeophilic groups, 

especially His and CystSH), also COO- 



addition to various nudeophilic 
groups, C00-, CysCSH) 



addition to various Dudeophilic eroum. 
Cy8(SH), Lys, His, etc 

synthesised by periodate oxidation of ATP* 

* addition to amino groups especially 
in the presence of NaBH^, oialdehyde 
derivatives of other nucleotides and 
nucleosides may be emph>yed similarly 

reaction with Lys In FLP and phosphate 
binding sites; irreversible, in the 
presence of NaBH^ or NaBHjCCN) 

requires U V irradiation: by addition 
to nucIeophJles and double bonds, 
insertion into C-H and O-H bonds, 
and other reactions 
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Table II. Useful Side Chain Modification Reagents* 
side chain or group 
amino (Lys -f a) 



reagent or procedure 



optimum reaction pH» side chain selectivity, and other comments 

— — — , * ' 



carboxyl 

(Asp + Glu) 



guanidfno 
(Arg) 



imidazole 
(His) 

indole (Trp) 



phenol (Tyr) 



thiol . 

(Cys-SH) 



thioether (Met) 



amidination (ethyl 
acetimidate) 

reductive alkvlation 

(formaldehyde + NaBH^ 
or NaBHgCN) 
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pH '"9, no other side chains react, positive charge maintained, other 4,71 
imido esters are available, extent of modification may be determined 
withTNBS 

pH ^9 with NaBH^, pH with NaBHjCN; reaction is much slower 5, 25 

under the latter conditions; no other side chains react; positive charge 
maintained; other aldehydes and reducing agents raay be used; extent 
of modification may be determined by amino acid analysis, the 
incorporation of radiolabel, or with TNBS 
pH ~8 and above, Tyr residues also modified, elimination of positive 72 

charge, extent of modification may be determined with TNBS 
same as above, Tyr residues undergo slow deacylation above pH ~6, 73 

replaces positive charges with negative charges 
pH '^8 and above, also reacts slowly with thiol groups, eliminates positive 47, 74 
charge and introduces large hydrophobic substituent, extent of reaction 
may oe determined spectrophotometrically 
pH —4.6-5, some side reactions with Tyr and thiol groups, other 46, 75 

carbodiimides are available, many other nucleophiks (amines) may be 
used to^ either maintain or alter the charge, extent of reaction may be 
determined by amino acid analysis or from incorporation of radiolabel 
pH ^7 or higher, reaction promoted by borate buffer, no mtyor side 76-79 
reactions; partially reversible upon dialysis, eliminates positive charge, 
extent of reaction can be determined from incorporation of radiolabel 
or by amino acid analysis, other dicarbonyi compounds can also 
be used (i.e., cyclohexanedione, glyoxal, etc). 
pH '^475, aide reactions with Lys kept to minimum by low pH, extent of 80, 81 

modification may be determined by spectrophotometric measurement, 
reversed in the presence of NH3OH 
usually pH ~4 or lower, higher pH values can be used; thiol groups are 82 
rapidly oxidized; IVr and His react more slowly; extent of modtilcation 
may be determined spectrophotometrically or by amino acid analysis 
pH <7.5, slight reaction with thiols, strong visible absorbance, can be used 83, 84 

to determine the extent of reaction 
pH —8 or higher, many different procedures and reagents. His also reacts 18, 85, 86 

but usually to a lesser extent, thiol groups are. rapidly oxidized, both 
mono and diiodo derivatives are formed, the extent of reaction 
can be estimated spectrophotometrically or by amino acid analysis, 
widely used for raaiolobeling of proteins 
pH '^8 or slightly higher, thiol groups are also rapidly oxidized, some 87 
nitration of Trp, extent of reaction may be determined 
spectrophotometrically or by amino acid analysis 
pH ^7 or higher; no effect on other residues under appropriate 88, 89 

conditions; Lys, His, Tyr and Met react slowly with excess reagent and 
long reaction times; extent of reaction may be determined with DTNB, 
by the incorporation of radiolabel, or by amino acid analysis 
pH «'6 or higher, reaction with Lys and His are much slower at pH 7 and 90, 91 
usually of no imj>ortance, the extent of reaction may be determined 
from incorporation of radiolabel or by amino acid analysis 
pH '»7 or higher, no other side chains react, reversible in presence of 48 92 

excess low MW thiol, the extent of modtilcation can be determined ' 
spectrophotometrically 
pH '^1 and higher, thiol groups also react very rapidly, reversed by 93 
treatment with low MW thiols, extent of modification may be 
determined by amino acid analysis after alkaline hydrolysis or by 
carboxymethylation followed acid hydrolysis 

Many useful reagents have not been included due to space limitations. Descriptions of reaction conditions, outcomes and literature 
citations are also breif and incomplete for the same reason. More complete information is available in the references and other sources cited 
elsewhere in this review. 
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reagents. Indirect determinations can also be obtained 
from the number of unreacted amino acid residues, as 
determined either spectrophotometrically (e.g., amino 
groups by TNBS (47) ox thiol groups by DTNB (45)) or 
by amino acid analysis. The extent of reaction can, of 
course, almost always be increased by the use of more 
vigorous reaction conditions, e.g., longer reaction times, 
larger excesses of reagent, and the presence of urea or 
other denaturing agents. Using more severe conditions, 
however, is usually accompanied by some decrease in side- 
chain selectivity, greater risk of conformational change, 
and, sometimes, other disadvantages. Reaction with other 
than target side chains may be of little importance when 
activities are not affected. 

A major loss of biological activity upon such treat- 
ment is often takep as evidence for the essentiality of 
the group modified. But this interpretation must be made 
with somewhat less conviction, owing to the possibility 
of unrecognized conformational changes or other subtle 
effects that may always accompany the modiHcation of 
a protein. The latter are obviously of less concern when 
fewer side chains are modified and for those modifica- 
tions that effect the least change in the size and charac- 
ter of side chains. Luckily, a reasonable number of reiagents 
are available for some of the more important side chains, 
allowing some discretion as to the nature of the modifi- 
cations that may be effected. Rat liver glycine methyl- 
transferasB, for example, is completely inactivated by reac- 
tion with excess DTNB {94), The inactivated enzyme 
is, however, almost completely reactivated by subse- 
quent treatment with potassium cyanide which, presum- 
ably, brings about the replacement of a relatively large 
and anionic 2-nitro-5-thiobenzoate moiety by a smaller 
cyano group with no formal charge, as follows: 

coo* 

P-S-H ♦ DTNB ■ P-S-S- ^^ ^— NOi — ' 

NTB- CN" NTB' 

P-S-CaN (1) 

A carboxymethyl moiety introduced by reaction with 
iodoacetate is also anionic but intermediate in size and 
effects only a partial loss of activity. The larger groups 
thus appear to block or otherwise perturb the active site, 
although none of the cysteine residues to which they are 
attached are really essential for catalytic activity. 

Similar inactivations have been noted following the addi- 
tion of large or charged groups to the cysteine residues 
of many enzymes that are either not inactivated or are 
only partially inactivated by the addition of smaller groups. 
2-Nitro-5-thiocyanatobenzoic acid can be used to effect 
a direct, single-step addition of cyano moieties to thiol 
groups (95, 96), although its reactions are not quite as 
simple as they might initially seem (97). Another reagent, 
methyl methanethiosulfonate, can be used to attach rel- 
atively smfdl, uncharged thiomethyl groups to cyteine res- 
idues, usually with comparable results (9S), 

As a general rule, modifications that havfe the least effect 
on side-chain character should have the least effect on 
protein structure and properties. Modifications of lysine 
residues that retain their usual cationic charge have, for 
example, generally been found to have relatively little 
effect on the biological activities and other properties of 
many proteins. Complete guanidination of the c-amino 
groups in tuna heart cytochrome c thus has almost no 
effect on its UV-visible spectrum, its redox potential, or 
its activity in a standard succinate oxidase assay system 



(99). The catalytic activity of papain is also essentially 
unaffected by complete guanidination (100), Amidina- 
tion or reductive alkylation of amino groups, both of which 
also retain the cationic charge, are generally preferred 
today, however, as both of those reactions take place under 
milder conditions (4, 5, 25). 

SIDE-CHAIN REACTIVITIES 

The reactivities of side-chain groups in proteins vary 
considerably depending on their locations and the infiu- 
ence of nearby residues with which they interact Under 
appropriate conditions, differences in reactivity can be 
used to characterize the environments of such side- 
chain groups. Kaplan and co-workers (iOi, 102) and oth- 
ers (103, 104) t for example, have developed procedures 
to determine the relative reactivities of certain types of 
side chains from the extent of their reaction with trace 
levels of one of several simple reagents. The intrinsic 
reactivity and pK^ of each reacting group can be deter- 
mined by comparing its reaction to that of a simple model 
compound over a range of pH values. 

For identical side-chain groups at different sequence 
positions, the observed differences in piC^ and reactivity 
are assumed to reflect differences in local environment. 
Side chains that experience a change in environment upon 
the binding of a ligand, complexatlon with another pro- 
tein, a change in redox state, or the like can be identi- 
fied by comparing the extent of their reaction in the two 
different states. This approach has been used primarily 
to evaluate the environments of the nucleophilic side 
chains — amino groups and histidine and tyrosine side 
chains — in proteins (105, 106), 

Different local environments may either suppress or 
enhance the reactivities of individual side-chain groups. 
Unusually reactive side chains are usually relatively easy 
to distinguish from others on the basis of their reactiv- 
ity and are, in many cases, also those required for bio- 
logical activity. Hates of inactivation, which may differ 
from overall rates of modification, can be used in many 
cases to characterize the reactivity and, sometimes, the 
number of active site residues (107-109), 

In many relatively simple cases, rates of inactivation 
can be correlated with those for the modification of one 
or more individual amino acid residues. The catalytic 
subunit of rabbit muscle cAMP -dependent protein kinase, 
for example, has only two thiol groups, and undergoes a 
biphasic reaction with DTNB (110), Its rapid inactiva- 
tion under those conditions correlates with the initial, 
rapid phase of modification, which has been shown to 
reflect the reaction of one thiol group about 17 times faster 
than the other. In this and other cases where rates of 
inactivation exceed overall rates of modification, selec- 
tively labeled derivatives, modified only at the active site, 
can often be isolated and characterized (111-113), 

Activities remaining at various stages of partial mod- 
ification can also be used, in some cases, to estimate the 
number of essential residues according to a procedure 
first described by Tsou in 1962 (114), The decreased iron- 
binding capacity of chicken egg white ovotransferrin after 
partial modification by phenylglyoxal, for example, sug- 
gests an arginine residue is required for each of its two 
bound Fe^"^ ions (76), In the more complicated case of 
transketolase, two arginine residues per dimer appear to 
be required for activity, but one appears to react with 
phenylglyoxal about 40 times faster than the other (115). 

SPECTROSCOPIC AND FLUORESCENT LABELS 

A number of important procedures requiring the incor- 
poration of spectroscopic or fluorescent labels have been 
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developed to characterize certain structural features of 
proteins. Fluorescence lifetimes and quantum yields of 
many different fluorescent groups and their sensitivities 
to quenching by acrylamide, iodtde, and other sub- 
stances caHi for example, be used to evaluate environ- 
ments in the vicinity of residues to which those groups 
have been attached {15,116). Fluorescence energy trans- 
fer measurements are also widely employed to estimate 
distances between certain internal, or intrinsic, chro- 
mophores and various selectively introduced, extrinsic, 
fluorescent labels and, in som6 cases, between selec- 
tively introduced," extrinsic, donor-acceptor pairs (117, 
118). lodoacetamidofluorescein, dansyl chloride, and N- 
l-pyrenylmaleimide are three examples from a very large 
number of fluorescent labels that have been used for such 
purposes. Most may be considered to be analogues of 
commonly used group-selective reagents and their reac- 
tion characteristics may be predicted accordingly. 

An extensive list of such reagents, with brief descrip- 
tions of their principal reaction and emission and exci- 
tation characteristics, has been presented by Haugland 
(119). Procedures to attach nitroxide moieties, for exam- 
• pie the reaction of 4-(2,2,6,6-tetramethyl-l-oxypiperidin- 
4-yl)-2-(fluo^osulfonyl)benzamide with chymotrjT^sin, have 
also been employed to obtain information concerning the 
protein environment and to detect conformational changes 
by EPR spectroscopy (17, 120). 

CROSS-LINKING AND IMMOBILIZATION 

Cross-linking of proteins and their immobilization, either 
by attachment to an insoluble support or by various other 
means, have a long and important history. The former 
is sometimes employed to increase the stability of pro- 
teins or of certain conformational relationships in pro- 
teins, to couple two or more different proteins (e.g., to 
join different activities into a single molecule), to iden- 
tify or characterize the nature and extent of certain pro- 
tein-protein interactions, and, in other cases, to deter- 
mine distances between reactive groups in or between 
protein subunits (36, 37, 121-125), Proteins are some- 
times immobilized to facilitate their reuse and their sep- 
aration from other products and (in some cases) to increase 
their stability. A large number of different procedures, 
including physical as well as chemical procedures, have 
been developed to immobilize proteins, and many reviews, 
symposia proceedings, and books on this subject are avail- 
able (126-130). 

A large number of different types of cross-linking or, 
as they are sometimes called, bifunctional reagents have 
been described. They include so-called zero-length cross- 
linking agents that bring about the direct formation of 
covalent bonds between existing amino acid side chain 
groups. The use of water-soluble carbodiimides to bring 
about the formation of amide linkages between carboxyl 
groups of aspartate or glutamate and the c-amino groups 
of lysine side chains appear to be the most prominent 
zero-length cross-linking agents (123, 131-133). Disul- 
fide bonds obtained from existing thiol groups would also, 
presumably, be considered zero-length cross-links (134, 
135), Such linkages appear to be formed only when the 
reacting groups are in close proximity. 

Other cross-linking agents may be organized accord- 
ing to the type(s) of reactive groups, their side chain reac- 
tivity, their hydrophobicity or hydrophilicity, and the 
length or distance between the reactive groups; whether 
the two, or in some cases more (136), reactive groups are 
the same or different (i.e., "homobifunctionaP or '^het- 
erobifunctional" reagents), whether the structure con- 
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necdng the reactive groups is readily cleavable, and whether 
the groups are membrane permeable or impermeable, and 
according to various other criteria. A list of the most 
widely used types of cross-linking agents and a few brief 
comments on some of their significant properties are pre- 
sented in Table III, A much more extensive list of cross- 
linking agents has been presented by Ji (125). 

The reactivities of cross-linking agents, except for one 
or two special cases, are very similar to those of the cor- 
responding monofunctional reagents. The initial reac- 
tion with a protein is presumably, in most cases, a sim- 
ple second-order process, not seriously affected by the 
second reactive group. The latter*s reaction, however, is 
completely dependent on the availability of a second appro- 
priate side chain which, for fast, efficient cross-linking, 
must be both nearby and in an appropriate orientation. 
Cross-linking agents with different lengths, different ste- 
reochemical configurations (some with Utile and others 
with a great deal of conformational flexibility), and with 
different side-chain specificities have been developed to 
fulfill different needs. Distances between potentially reac- 
tive side chains in the same or different subunits of some 
oligomeric proteins have, for example, been estimated by 
comparing rates and yields of cross-link formation with 
a series of cross-linking agents differing in length, stere- 
ochemical configuration, and side-chain reactivity (139, 
155, 146). 

The importance of side-chain proximity in these reac- 
tions is perhaps most evident in the case of cross-link- 
ing agents that undergo hydrolysis or some other inacti- 
vation process in addition to their cross-linking of pro- 
teins. The use of bifunctional imidoesters to characterize 
oligomeric proteins, for example, is based on the forma- 
tion of recognizable SDS gel electrophoretic patterns, 
reflecting the formation of cross-links between adjacent 
subunits (139, 138). Like the cross-links within a sub- 
unit, those between subunits are formed only when two 
amino groups are in close and appropriate proximity. Cross- 
links between other than adjacent subunits are largely 
precluded by the hydrolytic instability of the monofunc- 
tional imidoester intermediates. The importance of hydro- 
lytic stability on yields of cross-linked products has been 
discussed by Staros (37, 156). 

Of the 20 or so amino acid side chains normally present 
in proteins, e-amino groups of lysine residues are usually 
among the most abundant and most accessible of the 
potentially reactive groups. A relatively large propor- 
tion of the most commonly used cross-linking agents are 
therefore amino group selective reagents (i.e., imi- 
doesters, /V-hydroxysuccinimide esters, activated aryl flu- 
orides, etc.). Most of them, however, also undergo fairly 
rapid hydrolysis in addition to their reaction with amino 
groups, which, except for cases involving close proxim- 
ity, seriously limits the yields that may be obtained. Glu- 
taraldehyde, which does not hydrolyze or become other^ 
wise inactivated over long periods of time, is widely used 
to immobilize enzymes by cross-linking and to stabilize 
their adsorption to or entrapment in various materials 
(157, 158). The nature of its reactions with proteins may 
involve some Schiff base formation but is clearly much 
more complicated than that and not completely under- 
stood (137, 159, 160). 

The high reactivities of thiol groups with iV-ethylma- 
leimide, iodoacetate, and many related a-halocarbonyl 
compounds has led to the development of many cross- 
linking agents containing comparable maleimide and a- 
halocarbonyl moieties. Under the conditions usually 
employed for cross-linking, the latter are much more sta- 
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Table III. Homobifunctional and Heterobifunctional Protein Cross-Linking Agents' 



agent 



description 



refs cited 



glutaraldehyde 



dimethyl Buberlmidate (DM5) 



diauccinimidyl suberate (DSS) 



bismaletmidohexane (BMH) 
p-phenylenexnaleimide 



m-maleimidobenzoic acid 

W-hydrMysuccinimidB ester (MBS) 



N-8uccinimidyl 4-(7V-maleimidomethyl)- 
cyclohcxane-l-carboxylate (SMCC) 

N-succinimidyl 

3.(2.pyridyldithio)propionate (SPDP) 



2-inimoth)olane ("Traiit's reagent") 



Homobifunctiotial 
available aa 25% aqueous solution^ very effective reaction with amino 
groups and perhaps other nucleophi tic groups, contains polymer tc and 
other unknown materials, Ute nature of the reaction (s) are not known, 
alow progressive changes proceed long after the initial irreversible 
coupling 

a water*8oluble solid; reacts only with amino groups and does not 

•eliminate their cationic charge; reaction at pH 8 or above (optimal at 
pH f-S); t^yj ««• 46 min at pH 8.6 and 26 'C; -*-ll-A span; many related 
reagents with diHerent spans, some readily cleavable, are available or 
can b« easily synthestxeo 

a water-tnaoluble solid; must usually be dissolved in DM30 or other 
watar-mbcible organic solvent reacts with amino groups at pH 7 or 
above; reaction rates increase with pH; ti/a "* 4-6 n at pH 7; — ll-A 
span; many related reagents with different spans; hydrophilic spacer 
arms, some cleavable and water-soluble; sulfosuccinimide esters are 
available 

a water-insoluble solid, must usually be dissolved in DMP or other 
water-mi&cible organic liquid, reacts mth thiol groups at pH ~6-S; 

span; many related reagents with different span lengths; more 
hydrophilic spacer arms and cleavable analogs are avilable 

a water^nsoluble solid, must usually be dissolved in water-miscible organic 
solvent, reacts with thiol groups at pH '-G-S; 12- A span, ortho and 
meta isomer are also available, less stable than aliphatic maleimides 

Heterobifunctional 

a water-insoluble solid, must usually be dissolved in water-miscible organic 
liquid, initial reaction with amino group component at pH ^^-rS 
followed by coupling with thiol component at pH '*'6-fl, -^lO-A span, 
more water soluble sulfosuccinimide ester is also available 

a water-insoluble solid, must usually be dissolved in water-mlsctble organic 
solvent, reaction characteristics very similar to those of MBS. p'12-A 
span, more water soluble sulfosuccinimide ester is also available 

a water-insoluble solid, must usually be dissolved In a water-miscible 
organic solvent, initial reaction with the amino component at pH 
'-7-8.5 followed by either coupling to thiol component at pH 7 or 
above or treatment with DTT followed by couplmg to maleimidylated 
protein, '*'7-A span 

a water-soluble solid; reacts only with amino groups at pH 7-10 without 
eliminating their charge; reaction may be followed with DTNB; --fr-A 
span; may be coupled directly to MBS-, SMCC- or SPDP-trcated 
proteins 



137 



138, 139 



140, 141 



142, 143 
144-146 

147, 148 

149. 150 
161, 162 

153, 154 



« Many more cross-linking agents have been described. Those included appear to be among the most widely used and most important at 
the present time, Pleast consult references in the text for additional examples. 



ble to hydrolysis than the amino group reagents men- 
tioned above and the yields of cross-linked products are, 
therefore, usually somewhat less dependent on side chain 
proximity (161 ^ 162), 

A large number of heterobifunctional cross-linking 
reagents have been developed which usually contain a 
thiol reactive and an amino group reactive moiety. N- 
Alkyl- or N-arylmaleimide and a-halocarbonyl groups are 
the most common of the former and iV-hydroxysuccin- 
imide esters appear to be the most common of the lat- 
ter. To increase aqueous solubility, sodium salts of sul- 
fonated N-hydroxysuccinimide esters are also com- 
monly employed {163}. In addition to the two reactive 
groups a variety of different types of connecting struc- 
tures or spacer arms have been employed. The nature 
of the spacer arm may, of course, also have important 
consequences. Longer spacer arms are usually assumed 
to be more effective for coupling larger proteins or those 
where the potentially reactive side chains are sterically 
protected. The conformational flexibility, hydrophilic- 
ity or hydrophobicity, and the "cleavability** of the spacer 
arm are also important considerations. iV-Alkylmaleim- 
ides are also generally more stable than their aryl coun- 
terparts {162, 164), 

Photoactivatable heterobifunctional cross-linking agents 
are particularly useful for identifying interacting compo- 
nents in complicated biological systems (165), Wood and 
O'Dorisio {166), for example, used N-succinimidyl 4-azi- 
dobenzoate, N-succinimidyl 6-[(4'-azido-2'-nitrophenyl)- 
aminojhexanoate and two nonphotoactivatable homobi- 
functional cross-linking agents to identify vasoactive intes- 
tinal peptide receptors in human lymphoblasts by their 
coupling to ^^^I-labeled vasoactive intestinal peptide. A 



photoactive derivative of a N-formylated chemotactic pep- 
tidei prepared by reaction with the last mentioned pho- 
toactivatable agent, has also been used to characterize 
the iV-formyl peptide receptors of human polymorpho- 
nuclear leukocytes (167). 

The initial reaction with photoactivatable cross-link- 
ing agents is usually conducted in the dark so that the 
photoreactive group is inert. Cross-linking is then initi- 
ated in a subsequent step involving exposure to light. 
Azido groups which are converted into a highly reactive 
nitrenes and diazo moieties (i.e., diazoacetyl, diazo ketones, 
etc.) which give even more reactive carbenes upon pho- 
toactivation are the most common photoactivatable groups 
in use at this time (2, 3). Being so reactive, both react 
relatively indiscriminately with OH, NH, CH, and C=C 
moieties in their vicinity and have short half-lives. Their 
reaction with surrounding solvent usually precludes reac- 
tion with groups not in their immediate vicinity and leads 
to quite low yields. The detection of cross-linked prod- 
ucts thus often provides a good record of spatial relation- 
ships at the moment of photolysis but the yields are not 
adequate for most preparative purposes. 

Heterobifunctional cross-linking agents are particu- 
larly useful for conjugating different proteins. The dif- 
ferent side-chain reactivities of the two reactive groups, 
for example, usually permit the coupling to be carried 
out in a stepwise manner which allows, in some cases, 
for partial purification and, if desired, characterization 
of intermediates prior to the actual conjugation. Due to 
the hydrolytic instability of the most important groups 
directed at amino side chains, the first step usually inv6lves 
addition of the cross-linker to the amino groups of one 
member of the future hybrid pair (which either has no 
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thiol groups or where thiols, if present, are at least tem- 
porarily blocked). The removal of unreacted or hydro- 
lyzed reagent and other unwanted substances is usually 
possible at this stage. The resulting derivative is then 
directly coupled via the introduced thiol-reactive male- 
imido or a-halocarbonyl group(s) to the thiol-containing 
member of the intended hybrid pair. 

An artificial antibody-ricin conjugate, for example, has 
been prepared by treating ricin with m-maleimidoben- 
zoyl iV-hydroxysuccinimide ester and then incubating the 
resulting m-maleimidobenzoyl derivative with a par- 
tially reduced monoclonal antibody (248). The forma- 
tion of unwanted homoprotein conjugates is precluded 
by such two-step procedures, and purification of the result- 
ing hybrid conjugates by exclusion chromatography is usu- 
ally rather easy since they should be significantly larger 
than any of their precursors. lodoacetyl derivatives of 
avidin, alkaline phosphatase, and at least fotur other pro- 
teins are commercially available. 

Several reagents have been employed to introduce thiol 
groups into proteins, which may then be employed for 
conjugation to other proteins or various other materials. 
7V-Acetylhomocysteine thiolactone (168), (S-acetyl- 
thio)succinic anhydride (169), S-acetyl .V-succinimidyl- 
thioacetate (170), 2-iminothiolane (153), and Msuccin- 
imidyl 3-(2-pyridyldithio)propionate (151), for example, 
can all be used under mildly alkaline conditions to intro- 
duce thiol groups into proteins. In the second and third 
cases, the acetyl moiety must subsequently be removed, 
usually by treatment with hydroxylamine, to release the 
thiol group and, in the last case, a small amount of DTT 
or some other simple thiol must be used to affect a com- 
parable cleavage of the 2-pyridyl disulfide moiety. The 
resulting thiol groups potentially can be coupled to many 
different maleimidyl or a-halocarbonyl groups includ- 
ing, for example, those of certain protein-maleimidyl con- 
jugates as follows (171, 150): 



P-NH, 



H 



DTT 



H 



P-N 




X (2) 



Even more important, probably, is the ability of the lat- 
ter substituent to undergo direct coupling with the thiol 
groups of other proteins as follows (152, 172): 



o 

P-N'^^^'^S-S-^^ ♦ P'-S-H 



(3) 



Several 2-pyridyl disulfide-protein conjugates are com- 
mercially available. The susceptibility of disulfide link- 
ages to cleavage by low molecular weight thiols, how- 
ever, appears to preclude many applications of such con- 



jugates, including most of those involving exposure to 
physiological conditions. 

2-Iminothiolane is probably the most important reagent 
for introducing thiol groups into proteins. It is quite water 
soluble, whereas the others really are not, it reacts rap- 
idly with amino groups at pH 7 (or preferably a little 
above), and it does not require an additional activation 
step to effect release of the thiol moiety. It alone pre- 
serves the cationic charges of the modified amino groups. 
As with the other reagents used to introduce thiol groups, 
those introduced via reaction with 2-iminothiolahe can 
be used to effect oxidative coupling to other protein thi- 
ols or may react with various maleimidyl or a-halocar- 
bonyl groups, as follows (173, 154): 




S-H 




P' (4) 



CONCLUSION 

Space and time limitations have precluded the discus- 
sion of many important related subjects. We had hoped, 
in particular, to discuss the radiolabeling of proteins. Biot- 
inylation also deserves serious discussion. We apologize 
to the many authors whose works we have failed to cite 
and particularly to those whose results we may have mis- 
interpreted or misrepresented. We would also like to call 
the readers' attention to a number of reviews and books 
on this subject, where more complete information can 
be obtained (174-183). 
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Inhibition of urokinase has been shown to slow tumor 
growth and metastasis. To utilize structure-based drug 
design, human urokinase was re-engineered to provide 
a more optimal crystal form. The redesigned protein 
consists of residues ne^^-Lys^*^ (in the chymotrypsin 
numbering system; for the urokinase numbering system 
it is Ee^^^-Lys'**''*) and two point mutations, C122A and 
N145Q (C279A and N302Q). The protein yields crystals 
that diffract to ultra-high resolution at a synchrotron 
source. The native structure has been refined to 1.5 A 
resolution. This new crystal form contains an accessible 
active site that facilitates compound soaking, which was 
used to determine the co-crystal structures of urokinase 
in complex with the small molecule inhibitors amiloride, 
4-iodo-benzo(b)thiophene-2-carboxamidine and phenyl- 
guanidine at 2.0-2.2 A resolution. All three inhibitors 
bind at the primary binding pocket of urokinase. The 
structures of amiloride and 4-iodo-benzo(b)thiophene-2- 
carboxamidine also reveal that each of their halogen 
atoms are bound at a novel structural subsite adjacent 
to the primary binding pocket. This site consists of res- 
idues Gly^^^. Ser'^^, and Cys'^^-Cys^^o and the side chain 
of Lys*'*^. This pocket could be utilized in future drug 
design efforts. Crystal structures of these three inhibi- 
tors in complex with urokinase reveal strategies for the 
design of more potent nonpeptidic urokinase inhibitors. 



Cancer cell invasion, the spread and grov^th of tumor metas- 
tases, is a primary cause of mortality and morbidity of malig- 
nancy (2), and this invasion requires the degradation of base- 
ment membranes and other extracellular protein structures. 
Urokinase has been shown to be strongly associated with tu- 
mor cells (3) and to play a role in basement membrane degra- 
dation via a cascade mechanism involving activation of plas- 
minogen and the metalloproteases (4-6). Furthermore, 
inhibitors of urokinase have been reported to slow tumor me- 
tastasis as well as growth of the primary tumor (7-15). These 
inhibitors include the small molecules 4-iodo benzo(b)thio- 
phene-2-carboxamidine (B428),^ 4-benzQdioxolanyletheyl ben- 
zo(b)thiophene-2-carboxamidine (B623) (12-14), and amiloride 
(8, 15). These compounds are competitive inhibitors of uroki- 
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nase and have been proposed to bind at the primary binding 
pocket common to all trypsin-like serine proteases (15). How- 
ever, none of these compounds posses all of the characteristics 
of a good therapeutic agent for the treatment of cancer. 

Structure-based drug design has become an important tool 
for improving the potency and pharmacological characteristics 
of compounds toward providing therapeutic agents. This 
method has contributed to the development of potent and spe- 
cific inhibitors for many targets such as HIV protease, cy- 
clooxygenase-2, influenza neuraminidase, and the metallopro- 
teinases (16-22). To most efficiently apply crystallography- 
driven structure-based drug design, it is preferable that the 
crystals have certain properties. One property is that active 
site of the target is open in the crystal lattice. This molecular 
packing permits the diffusion and binding of compounds into 
the active site and eliminates the need to optimize crystal 
grov^rth in the presence of each inhibitor. Another important 
property is that the crystals reproducibly diffiract to high res- 
olution (2.5-2.0 A). It is preferable that this data quality is 
achievable on a conventional rotating anode source, thereby 
eliminating the need for travel to synchrotron facilities. The 
higher resolution data facilitate unambiguous map interpreta- 
tion and minimize the average atomic positional error (23). 
Hence, an appropriate crystal form can greatly facilitate the 
process of structure-based drug design. A crystal system exists 
for urokinase, although it does not fully encompass the pre- 
ferred properties outlined above. 

Human low molecular weight (LMW) urokinase has been 
crystallized in complex with the peptidic inhibitor Glu-Gly-Arg- 
chloromethyl ketone (1). This structure reveals the geometry of 
the urokinase active site as well as the orientation of a peptide 
inhibitor in the substrate-binding groove. However, the LMW 
urokinase crystals diffiract to lower resolution (2.5 A resolution, 
synchrotron radiation; 3.0 A resolution, rotating anode source) 
and utilize co-crystallization to achieve the target-ligand com- 
plex. In addition, the active site is in close contact with another 
molecule because of a noncrystallographic 2-fold axis near the 
active site. This interaction could limit minor ligand induced 
conformational shifts and perhaps distort the active site con- 
formation. Furthermore, the noncrystallographic and crystal- 
lographic packing effectively blocks the active site such that it 
would be difficult to diffiise small molecules into the active site 
in this crystal form (if they were not blocked by the irreversible 
covalent inhibitor). Hence, although this system may be used 
for modeling of small molecule urokinase inhibitors, it may not 
provide an ideal system for structure-based drug design. There- 
fore, to design an anti-cancer therapeutic, a new crystal form of 
human urokinase was sought to facilitate the application of 
structure-based drug design. The strategy utilized protein en- 
gineering and information from the reported LMW urokinase 
structure to design an altered protein sequence to yield a new 
crystal form. 
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Crystal Structures of Urokinase at High Resolution 



The new form of urokinase, micro-urokinase, crystallizes 
under conditions very similar to the low molecular weight form 
(1), although crystal packing and data quality are very differ- 
ent. This new crystal form contains a monomer in the asym- 
metric unit and diffracts to ultra-high resolution ~ 103 
A). In addition, this crystal form has an open active site per- 
mitting direct diffusion of compounds into the apo-crystals and 
is therefore ideal for providing precise structure determina- 
tions for urokinase ligand complexes by the soaking technique. 

The re-engineered crystal system and soaking technique 
were utilized to determine the co-crystal structure of urokinase 
in complex with a series of small molecule inhibitors at 2.0 or 
2.2 A resolution. Two of these inhibitors, amiloride (24), and 
B428 (25, 26), have been shown to reduce tumor size and 
metastasis (8, 12-15), whereas the effect of the third, phenyl- 
guanidine (27) has not been reported to date. These complex 
structures were completed to determine the binding orienta- 
tion of each compound to urokinase. This information in turn 
may be utilized to design molecules of increased potency to- 
ward discovery of an anti-cancer therapeutic compound. 

EXPERIMENTAL PROCEDURES 

Recombinant Micro-urokinase — Micro-urokinase was engineered by 
polymerase chain reaction manipulations using a human urokinase 
cDNA as a template (28). The C279A and N302Q mutations were made 
by the method of polymerase chain reaction based site-directed mu- 
tagenesis. Urokinase native leader sequence was fused directly to Ile^^^ 
by polymerase chain reaction. This product was ligated to a baculovirus 
transfer vector pJVPlOz (29). The final expression vector sequence was 
confirmed by DNA sequencing. 

The pJVPlOz-micro-urokinase vector was transfected into Sf9 cells 
by the calcium phosphate precipitation method using the BaculoGold 
kit from PharMingen (San Diego, CA). Single recombinant virus ex- 
pressing micro-urokinase was plaque purified by standard methods, 
and a large stock of the virus was prepared. Large scale expression of 
micro-urokinase was performed in suspension in High-Five cells, (In- 
vitrogen, San Diego, CA) growing in Excel 405 serum free medium (JRH 
Biosciences, Lenexa, KS) at 27 ''C. Urokinase activity in the superna- 
tant was measured by amidolysis of the chromogenic urokinase sub- 
strate H-D-pyroglutamyl-Gly-L-Arg-p-nitroanilide (S2444; Helena Lab- 
oratories, Beaumont, TX). The culture supernatant was harvested as 
the starting material for purification. Protease inhibitors, iodoacet- 
amide (10 mM), benzamidine (5 mM), and EDTA (1 mM) were added to 
the pooled culture medium. The medium was diluted 5-fold with 5 mM 
HEPES. pH 7.5, and filtered through 1.2 and 0.2- jum membranes. The 
micro-urokinase protein was captured onto Sartorius membrane ad- 
sorber SlOO (Sartorius, Edgewood, NY) by passing the medium through 
the membrane at a flow rate of 50 —100 ml/min. After extensive wash- 
ing with 10 mM HEPES, pH 7.5, containing 10 mM iodoacetamide, 5 mM 
benzamidine, and 1 mM EDTA, micro-urokinase was eluted from SlOO 
membrane with a NaCl gradient (20-500 mM, 200 ml) in 10 mM HEPES 
buffer, pH 7.5, 10 mM iodoacetamide, 5 mM benzamidine, 1 mM EDTA. 
The eluate was diluted 10-fold with the above 10 mM HEPES buffer 
containing inhibitors, and loaded onto a S20 column (Bio-Rad). Micro- 
urokinase was eluted with a 20x column volume NaCl gradient (20- 
500 mM). No inhibitors were used in the elution buffers. The eluate was 
then diluted 5-fold with 10 niM HEPES buffer, pH 7.5, and loaded onto 
a heparin-agarose (Sigma) column. Micro-urokinase was eluted with a 
NaCl gradient from 10—250 mM. The heparin column eluate of micro- 
urokinase was applied to a benzamidine-agarose (Sigma) column equil- 
ibrated with 10 mM HEPES buffer, pH 7.5, 200 mM NaCl. The column 
was washed with the equilibration buffer, and the urokinase was eluted 
with 50 mM NaOAc, pH 4.5, 500 mM NaCl. The micro-urokinase eluate 
was concentrated to 4 ml by ultrafiltration and applied to a Sephadex 
G-75 column equiHbrated with 20 mM NaOAc, pH 4.5, 100 mM NaCl. 
The single peak containing micro-urokinase was collected and lyophi- 
lized as the final product. 

Amidolytic Kinetics of Urokinase and Micro-urokinase — The effects 
of synthetic inhibitors on the steady state amidolytic activity of LMW 
urokinase or micro-urokinase toward the chromogenic substrate, S2444 
(Helena Laboratories), was characterized by the formation of p-nitroa- 
naline (30). Briefly, 0-50 ^am concentration of inhibitors were tested 
against 25 lU/ml (0.14 ng/ml) LMW urokinase or micro-urokinase and 
0.4-4.0 mM concentrations of S2444 in 200 /xl volumes in phosphate- 



buffered saline and 0.01% bovine serum albumin, pH 7.4. Incubations 
were performed at 37 *C with absorbance at 405 nm recorded every lis 
for 20 min. Data were plotted as 1/S versus l/v for Lineweaver-Burk 
analysis and the calculation of inhibition constants. values were 
obtained from replots of the resultant slopes versus I (26, 31). 

Protein Crystallography — Crystals were obtained by the hanging 
drop vapor diffusion method. A typical well solution of 0.15 m Li2S04, 
20% polyethylene glycol MW 4000 in succinate buffer, pH 4.8-6.0, was 
used. On the coverslip, 2 /xl of well solution is mixed with 2 /xl of protein 
solution, and the slip is sealed over the well. Crystallization occurred at 
18—24 *C within 24 h. The protein solution was composed of 6 mg/ml 
(0.21 mM) micro-urokinase in 10 mM citrate, pH 4,0, 3 mM e-amino 
caproic acid />-carbethoxyphenyl ester chloride v^dth 1% Me^SO co- 
solvent. The resultant micro-urokinase crystals are composed of en- 
zyme vrith an empty active site. The compound €-amino caproic acid 
p-carbethoxyphenyl ester chloride is reported to inhibit urokinase with 
an apparent of 0.3 p.M at neutral pH and was co-crystallized with 
urokinase in an attempt to obtain a complex structure (32). Repeated 
tests with this compound resulted in a structure with an active site 
occupied only by ordered solvent molecules even at 1.5 A resolution. 
Hence, we have hypothesized that this inhibitor is degraded during the 
crystallization experiment albeit critical for obtaining urokinase crys- 
tals. Studies are underway to try to understand the mechanism of this 
phenomenon. 

The micro-urokinase crystals belong to the space group P2,2i2i with 
unit cell dimensions of a = 55.16 A, 6 = 53.00 A, c = 82.30 A and a = 
^ =^ y = 90" and diffract beyond 1.5 A on a Rigaku RTP 300 RC rotating 
anode source equipped with an RAXISII detector. In addition, a 1.03 A 
resolution native data set was collected on a CCD detector at beam line 
Fl of the Cornell High Energy Synchrotron Source in Ithaca, NY. All 
data were collected at 100-160 K and processed by the program pack- 
age DENZO (33). Before crystals were fVozen, they were passed through 
a solution of 0.15 m Li2S04, 20% polyethylene glycol MW 4000, succi- 
nate buffer, pH 4.8-6.0, and 20% glycerol for cryogenic protection. Data 
were collected at low temperature to preserve the diffraction of the 
crystal throughout data acquisition. The crystal structure was deter- 
mined by the molecular replacement method using the program 
AMORE (34). The LMW urokinase structure was used as the search 
probe (1) (Protein Data Bank enti-y ILMW) against the RAXISII data. 

The structure was refined to 1.5 A resolution using the synchrotron 
data and the program package XPLOR (35) by a combination of rigid 
body, simulated annealing maximum likelihood refinement, and max- 
imum likelihood positional refinement. Electron density maps to 1.5 A 
resolution were inspected on a Silicon Graphics INDIG02 workstation 
using the program package QUANTA 97 (Molecular Simulations, Inc). 
At 1.5 A resolution constrained individual temperature factor refine- 
ment was also included in the refinement cycle. Electron density maps 
to 1.5 A resolution were examined, and water molecules and bound ions 
were identified as positive peaks in the F„ — map at least 4 cr above 
noise. Refinement continued with automatic water addition using the 
XWAT feature of SHELXL (36). Final refinement steps included cycles 
of model building where disorder and additional solvent molecules were 
added. The final R-factor is 19.2% with a R^^^^ of 21.8%. 

To obtain the amiloride, B428, or phenylguanidine micro-urokinase 
complex structures, crystals of urokinase were placed in 50 /llI of crys- 
tallization mother liquor to which 0.5 jLtl of a 1 mg/10 /xl compound 
solution was added. The solid compound was obtained from the Abbott 
chemical repository and was initially dissolved in McaSO. Crystals were 
allowed to incubate for 12-15 h at 24 "C and prepared for data collection 
in a manner identical to that of the native crystals. Data were collected 
on a Rigaku RTP 300 RC rotating anode source equipped with an 
RAXISII detector at 160 K by the method of flash freezing. Data were 
processed using the HKL program suite (33). Initial electron density 
maps were calculated using the program package XPLOR (35) and the 
1.5 A native model. All electron density maps were inspected on a 
Silicon Graphics INDIG02 workstation using QUANTA 97, and the 
orientation of all compounds were clearly visualized in the initial 2F^ — 
map. The complexes were refined to 2.0 A resolution using the 
program package XPLOR. Refinement consisted of alternating steps of 
positional and B-factor refinement. Ordered solvent molecules were 
identified as positive peaks in the F^ — F^ map that were 4 cr above 
noise. 

Table I summarizes statistics for all micro-urokinase models. All 
data are between 89 and 90% complete with a merging R^y^ between 7 
and 11% and an J/<t between 12 and 15. The native model is refined to 
a i?(v.ctor of 19.2% and Rf^^^ of 21.8% at 1.5 A resolution. The overall 
B-factor for the protein is 12 A^, and the overall B-factor for the 337 
ordered solvent molecules is 26 A^. The current native model also 
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Table I 
Data quality statistics 
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^ Value of the Rfoinor where 10% of the data were randomly removed from the refinement. 



contains three ordered sulfate ions, and two alternate side chain con- 
formations located at the active site. All backbone atoms are well 
defined in the final 2F^ — F^ map with atomic B-factors at or below 30 
A^. The B428 model is refined to 2.0 A resolution with a Rf^^^^ of 20.9% 
and a Rfr^^ of 27.7%, while the amiloride model is refined to 2.2 A 
resolution with a -Rector 21.5% and a Rf^o^ of 29.1%. The phenylgua- 
nidine model is refined to 2.0 A resolution with a /^factor of 18.9% and a 
^freo of 22.1%. Data for the complex structures were of quality compa- 
rable with that of native structures collected under the same conditions 
on a rotating anode source. 

RESULTS 

Redesign of LMW Urokinase — To redesign the LMW uroki- 
nase sequence for the purpose of improving the crystal charac- 
teristics, the LMW urokinase coordinate file (Protein Data 
Bank entry ILMW) was examined for sequences of excessively 
high B-factor, suggesting areas of disorder. The hypothesis is 
that areas of high disorder in the structure may contribute to 
the overall disorder of the crystals and/or may interfere with 
optimal crystal packing. The LMW urokinase structure con- 
sists of residues 136-158 of the A-chain and 159-411 of the 
B-chain connected by a disulfide bridge between Cys^"*** and 
^yg279 (urokinase numbering).^ The B-chain corresponds to the 
serine protease domain, whereas the 21 residue A-chain lacks 
the kringle and epidermal growth factor domains present in 
full-length urokinase. The A-chain is reported to be an area of 
high disorder (1), and examination of the protein data bank 
coordinate file (Protein Data Bank entry ILMW) reveals that 
residues 148-155 of the A-chain have an average B-factor of 64 
ranging from 26 for the disulfide-linked sulfur of residue 
Cys^'**' to 110 for Pro^^"^. The very high B-factors for the 
LMW urokinase A-chain confirm this observation. Conse- 
quently, the A-chain was removed as a first step in the rede- 
sign. Furthermore, to remove the resultant free thiol on the 
B-chain, Cys^"*® was mutated to an alanine. 

Further examination of the LMW urokinase coordinate file 
indicates a second area of disorder consisting of residues 405— 
411 of the C terminus where the average B-factor is 147 A^. 
Residues 407-411 represent a five residue extension in uroki- 
nase relative to other trypsin-like serine proteases. However, 
because residues 405-406 also have high atomic B-factors, the 
entire 405-411 segment was removed. The final potential site 
for disorder is the glycosylation site at residue 302. This glyco- 
sylation site was removed by an N302Q mutation to facilitate 
expression of the glycosylation-free protein in baculovirus. 
Hence, the re-engineered urokinase (micro-urokinase) consists 



^ The urokinase numbering system is used for discussion of the se- 
quence re-engineering work, whereas the chymotrypsin numbering sys- 
tem as aligned by Ref. 1 is used for discussion of the serine protease 
domain structure for micro-urokinase. 



of residues Ile^^^— Lys'^*^'* (Ile^^-Lys^*^ chymotrypsin numbering 
system) with the two point mutations C279A (C122A) and 
N302Q (N145Q). 

Micro-urokinase Crystal Packing — Micro-urokinase crystal- 
lizes with a monomer in the asymmetric unit (P2i2i2i), 
whereas the LMW urokinase crystal form has a dimer in the 
asymmetric unit (R3) with intimate contacts at the substrate- 
binding site. Specifically, in LMW urokinase, residues 94-101 
from each molecule (chymotrj^psin numbering system as 
aligned by Ref 1)^ form a series of intermolecular main chain 
hydrogen bonds resulting in an extended four stranded /3-sheet 
(1). From the LMW urokinase structure, it was seen that this 
loop decreases the size of the S4 pocket relative to that at the 
substrate-binding site of other serine proteases such as throm- 
bin, Factor Xa and tissue plasminogen activator (1, 37-39). 
Hence, this loop provides a critical structural feature of the 
substrate-binding groove. However, because of the close crystal 
contact at this site in the LMW urokinase crystals, the possi- 
bility existed that the structure of the substrate-binding site 
may be distorted or conformationally restricted. The new crys- 
tal form of micro-urokinase lacks the close crystal contact pres- 
ent in LMW urokinase, and an overlay of the two structures 
indicates that the conformation of this loop is essentially iden- 
tical in the two crystal forms. Consequently, it is unlikely that 
packing in either crystal system affects the conformation of this 
loop and the resultant shape of the S4 pocket, although the 
more open micro-urokinase packing may allow for inhibitor- 
induced conformational shifts. 

Examination of crystal packing at the A-chain-binding cleft 
gives insight into why micro-urokinase yields different lattice 
packing and better diffracting crystals (a sample of the final 
2F^ - electron density map at 1.5 A resolution is shown in 
Fig, LA). In LMW urokinase, the A-chain binds in a cleft com- 
posed of residues 25-29, 116-122, and 201-208. In the crystal 
structure of micro-urokinase, there is no A-chain, and the A- 
chain-binding cleft is partially occupied by a symmetry related 
molecule. Specifically, a hydrophobic loop extending from 144 
to 150 in the symmetry related molecule is directly bound at 
the A-chain site such that Tyr^'^^-OH of the loop is involved in 
two hydrogen bonds at the A-chain cleft (Ser^°^-N and Ser^^^- 
O). In LMW urokinase, the A-chain blocks this set of interac- 
tions. Thus, in micro-urokinase, removal of the A-chain exposes 
a new "binding site" for the 144—150 loop of another micro- 
urokinase molecule permitting a new lattice to form. This in- 
teraction at the A-chain cleft probably contributes to the im- 
proved crystal quality by being both a site of nucleation as well 
as by facilitating very close contact between adjacent 
molecules. 



7242 



Crystal Structures of Urokinase at High Resolution 




B 



H99 " 





C58 



042 



Fig. 1. A, final 2F^ - electron density map contoured at 1 <t for 
native micro-urokinase at 1.5 A resolution. Residues 146—148 are de- 
picted in thick lines. B, 2F„ - (purple) and F,, - F^ {green) at His^'-*. 
The 2F^ - F^ map is contoured at 1 a, and the F^- F^\s contoured at 
3 <T. The map is for refinement of the side chain in one conformation. C, 
2F^ - (purple) and F^ - F, (green) at Cys*'^. The 2F„ - F^ map is 
contoured at 1 cr, and the F„ - F^, is contoured at 3 a. The map is for 
refinement of the side chain in one conformation. 

Micro-urokinase and LMW urokinase are nearly identical in 
structure (overall rms deviation for main chain atoms, 0.8 A) 
with one significant structural change near a site of re-engi- 
neering. As discussed above, removal of the A-chain results in 
an empty cavity. One loop (201-210) forming this site under- 
goes a conformational shift relative to LMW urokinase with 
rms deviation (main chain) ranging from 1.1 to 1.8 A vidth the 
largest shift being for Arg^^^. However, although this loop is 
involved in a crystal packing interaction, the conformation of 
the 144-150 of the symmetry related molecule is the same for 
both micro-urokinase and LMW urokinase. Other sites of var- 
iation include the flexible loop at residues 37-37D (rms devia- 
tion main chain, 1.7-3.5 A), residues 17-19 (rms deviation 
main chain, 1.1-2.1 A) and residues 185B-186 (rms deviation 
main chain, 1.7 A). All areas were of high b-factor in the LMW 
urokinase structure (b-factor > 60-90 A'"^) but of significantly 
lower b-factor in the micro-urokinase structure (b-factor < 20 
A^) with the exception of residues 17-19, which were of low 
b-factors in both structures. The 17-19 segment was clearly 
defined in the final 2F^ ~ electron density maps of micro- 
urokinase and is not near any re-engineered sites. Residues 
185B-186 were remodeled in the higher-resolution structure. 
In the lower resolution LMW urokinase structure, Trp^**^ was 
exposed to solvent and Gln^®^** was buried. The higher resolu- 
tion data clearly placed Trp****^ in the protein core with Gln*^*'*'^ 
exposed to solvent. 

Active Site of Native Micro-urokinase — Like the overall mo- 
lecular fold, the active sites of LMW urokinase and micro- 
urokinase are nearly identical (rms deviation, <0.8 A). The 
higher resolution data did not depict any large side chain 
movements relative to LMW urokinase but did show an alter- 
nate side chain conformation for two residues (Fig. 1, B and C) 
in addition to a bound sulfate ion (see Fig. 3C). The sulfate ion 
is bound near the oxyanion hole (40), where Ol is accepting 
hydrogen bonds from Gly^'^^-NH (2.8 A) and Ser'^^-OH (2.8 A), 
whereas O2 is accepting a hydrogen bond from His^^-N€2 (2.8 
A). Hence, the higher resolution data revealed more structural 
details at the active site. 

In Fig. IB, native 1.5 A 2F^ - (contoured at 1 a) and F^ - 
F^ (contoured at 3 cr) electron density maps depict that the side 
chain of His^^ is in multiple conformations. These maps were 
calculated before the alternate conformation had been included 



Table 11 

Inhibition constants determined for LMW urokinase and 

micro'U rokinase 

Ring numbering is shown in conjunction with the chemical structure 
for each inhibitor. 



LMW- 
urokinase 



Kj(nM) 



micro- 
urokinase 




0.490 + 0.018 



0.512 + 0.022 



7.2 + 0.2 



6.9 + 0.4 



H5N 



Phcnylguanidine 20.6+^1.0 



17.4 + 1.1 



in the model. As presented in Fig. IB, one His®® conformation 
is identical to that observed with LMW urokinase. In this 
conformation, His®^-N61 accepts a hydrogen bond from 
Tyr®'*-OH (2.9 A). In the alternate conformation (modeled into 
the green positive peak; Fig. LB), the His^^ imidazole is rotated 
approximately 90** about the Cj3-Cy bond resulting in a differ- 
ent hydrogen bonding pattern. Here, His^'^-NSl can donate a 
hydrogen bond to Asp^^'^-OSl (3.2 A). The His^'^ side chain 
forms part of both the S4 and pockets. Hence, a change in the 
conformation of His^® results in a change in the overall shape of 
S2 and S4, suggesting that the side chain movement would 
effect a drug design strategy directed toward the substrate- 
binding groove. 

The side chain of Cys'*^ is also observed in two side chain 
conformations and is near the active site (Fig, IC). In what is 
likely the major conformation, the Cys'^^-Cys^® disulfide bridge 
is intact. However, in the alternate conformation, the disulfide 
is broken and the Cys"*^ thiol group Ues in a small hydrophobic 
pocket formed by the side chains of Phe^®, Ile^^, and Val"*^. This 
side chain shift is unexpected as the Cys'^^-Cys^^ disulfide 
bridge is present all trypsin-like serine protease structures, 
and its proximity to the catalytic triad suggests that it may 
structurally stabilize the active site. Hence, one might expect 
the catalytic activity to be affected when this disulfide bridge is 
broken. On the other hand, one must note that this observation 
occurs in the solid state and that further solution work would 
be necessary to determine its physiological significance. 
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Fig. 2. Lineweaver-Burke analyses of B428 inhibition of mi- 
cro-urokinase were performed in amidoljrtic chromogenic as- 
says with S2444 as described under **Experimental Procedures." 

S2444 substrate concentrations were 0.8, 1.0, 1.3. 2.0, and 4.0 mM. B428 
concentrations were 0 nM (▼), 250 nM (A), 500 nM (•), and 1000 nM (■). 
Data represent the means of triplicate determinations. Ki values were 
determined by replots of slope verses inhibitor concentration Unset) and 
are represented in Table II. 



Examination of crystal packing at the active site reveals that 
the micro-urokinase molecules pack forming a solvent channel 
that leads to the active site groove. Therefore, small molecule 
inhibitors may diffuse into the crystal and bind at the active 
site. This is important from a structure-based drug design 
perspective because it facilitates soaking as a method of form- 
ing protein-compound complex crystals. The soaking method 
was used to obtain crystal structures with the three known 
urokinase inhibitors, B428, amiloride, and phenylguanidine. 
These structures were obtained at high resolution and provide 
a starting point for structure-based drug design of a nonpep- 
tidic urokinase inhibitor. 

B428 — B428 has been reported to inhibit human urokinase 
with an IC50 value of 0.320 (Refs. 25 and 26 and Table II). 
B428 inhibition was tested versus LMW urokinase and micro- 
urokinase, and Fig. 2 presents the Lineweaver-Burke analysis 
for the effect of B428 on the activity of micro-urokinase. The 
results show that B428 competitively inhibits micro-urokinase 
as observed for the native enzyme (25, 26). As listed in Table II, 
B428 inhibits LMW urokinase vnth a iC,. of 0.490 /xm while 
inhibiting micro-urokinase with a Ki of 0.512 ^xm. Hence, 
values for the native and re-engineered forms of the protein are 
essentially identical and are consistent with reported IC50 

values (25. 26). 

The B428-micro-urokinase co-crystal structure was com- 
pleted to 2.0 A resolution. In the complex structure, the 2F^ - 

and - maps indicate that His®^ is in two conformations 
as observed in the native structure although Cys^^ is observed 
only in the conformation in which the Cys'*"^-Cys*'*® disulfide 
bridge is intact. It is unclear why only one conformation is 
observed for the Cys'^^-Cys^® disulfide. In the native structure, 
the alternate conformation became visible at high resolution. 
Hence, one possibility is that second conformation is not visible 
in the lower resolution electron density map. Another explana- 
tion is that inhibitor binding may induce a shift to a single 
conformation or that the inhibitor may only bind to the protein 
form where the disulfide is intact- Further experiments at high 
resolution will be necessary to fully understand this phenom- 
enon. Fig. 3A shows the 2F^ - F^ (contoured at 1 a) and F„ - 
F^ (contoured at 3 a) electron density maps calculated in the 
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Fig. 3. a, initial 2F« - F,. (purple) and F^ - F^ {green) maps contoured 
at 1 and 3 tr, respectively, for the binding site of B428 before refinement. 
B, molecular surface as calculated by the program package QUANTA 
(Molecular Simulations Inc.) depicting interactions between B428 and 
micro-urokinase. The inhibitor and inhibitor surface are shown in or- 
ange^ whereas the protein and the protein surface are shown in cyan. C, 
view of B428 bound at the site of urokinase. The S.^ site between 
His^' and His^^ is also shown as well as the S4 site. An ordered sulfate 
ion is also shown bound near the oxy anion hole, 

absence of inhibitor and before any refinement cycles. All at- 
oms of the inhibitor are clearly defined in both maps, and the 
compound is found to bind at the Sj pocket as might be pre- 
dicted fi-om its net positive charge. 

Interactions between B428 and the pocket are consistent 
with observations for trypsin and other trypsin-like enzymes 
(41-45). Nearly all atoms of B428 are in van der Waals' or 
hydrogen bonding contact with the site (Fig. 3, B and C). The 
inhibitor does not occupy other pockets of the substrate-binding 
groove. The benzothiophene ring is in contact with the rim of 
the Si site that is composed of the Cys^^^-Cys^^° disulfide 
bridge and the main chain atoms of Ser^^^-Cys^^^ and Gln^^^- 
Cys^^^ In the pocket, the thiophene ring is also in contact with 
the side chains of VaV'^^^ Ser^^^ Asp^^\ and Ser^^^ The ami- 
dine is donating hydrogen bonds to Ser^^'^-Oy (3.0 A), Asp^^®- 
OSl (2.8 A), Asp^«^-OS2 (2.8 A), and Gly^^^-O (2.7 A) (Fig. SB). 
Hence, both hydrophobic and hydrophilic interactions occur at 

Si. 

In addition to interactions at S^, the 4-iodo group is pomting 
out of the Si pocket away from the substrate-binding groove 
and is making van der Waals' interactions with the side chain 
of Cys220 and the main chain atoms of Gly^^^. These residues 
form part of a subpocket composed of the disulfide bridge at 
Cys^®^-Cys22°, residues Gly^^® and Ser^*^, and the side chain of 
Lys^'*^. This pocket has been termed the Sjp pocket because of 
its proximity to the primary site (Fig. 3C). It is reported that 
the 4-iodo group of B428 confers a 10-fold increase in binding 
potency relative to the 4-hydro compound (25, 26). This obser- 
vation is consistent with the B428-urokinase crystal structure 
where the 4-iodo group partially accesses the Si/3 pocket. Fur- 
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Fig 4 A initial 2F - F {purple) and - (5ree^^) maps contoured at 1 and 3 ir, respectively, for the binding site of amilonde before 
refinement; molecular surface as calculated by the program package QUANTA (Molecular Simulations Inc.) depicting interactions between 
amiloride and micro-urokinase. The inhibitor and inhibitor surface are shown in peach, whereas the protein and protein surface are s^^o^" ^"^^"^^ 
C, overlay of the crystal structures of amiloride {purple) and B428 {orange) micro-urokinase showing that the halogen atoms of each inhibitor are 
occupying the same site. 



thermore. B623 inhibits urokinase with an IC50 of 0.07 ptM (25, 
26). Based upon the crystal structure of B428-micro-urokinase, 
it is possible that this larger 4-substituent is occupying more of 
the Si/3 pocket^ and consequently binds more tightly to uroki- 
nase. Hence, access to this novel pocket has been shown to 
confer an increase in binding potency and may serve as a site 
for further substitution in structure-based drug design. 

Examination of the crystal structure of B428-urokinase 
shows that the 5 and 6 positions of the benzo(b)thiophene-2- 
carboxamidine are also open for substitution, whereas the 3 
and 7 positions are buried within the pocket and therefore 
less likely to accommodate a substituent. Of these, the 5 posi- 
tion does not directly point toward any pockets of the urokinase 
molecule because it points toward Gln^^'^ and out toward bulk 
solvent. Hence, substitution at this position is less likely to 



^ The crystal structure of B623 in complex with urokinase could not 
be completed because of solubility issues with the compound. 



confer a large increase in binding potency. On the other hand, 
the 6 position points toward the urokinase catalytic site al- 
though the position appears partially blocked by the side chain 
of the active site Ser^^^. The distance from Ser^^^-OH to the 6 
position carbon is 3.2 A; therefore incorporation of a substitu- 
tion at this position may require a shifting of the benzothio- 
phene scaffold away from Ser'^*^. Additionally, substitutions at 
the 6 position would not orient toward the substrate-binding 
groove accessed by Glu-Gly-Arg-chloromethyl ketone. Substi- 
tutions at the 6 position would have to bend back toward the 
substrate-binding site or access other subsites. Nevertheless, 
the 4 and 6 positions appear to be the best substitution sites 
toward increasing the binding potency of B428, and both sets of 
substitutions will likely occupy sites apart from the substrate- 
binding groove. 

Amiloride — Amiloride has been reported to inhibit human 
urokinase with a (24) or IC50 of 7 mm (25, 26). As observed 
with B428, amiloride also competitively inhibits LMW uroki- 
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nase and micro-uro kinase with similar values {K^ = 7.2 /xm for 
LMW urokinase, and = 6.9 /xm for micro-urokinase). Amilo- 
ride is a weaker urokinase inhibitor than B428 (Table II) but 
may have more favorable pharmacological properties because 
the compound is an orally active commercial drug (46). To 
compare the binding modes of amiloride and B428 and to es- 
tablish strategies for development of a more potent amiloride- 
based urokinase inhibitor, the co-crystal structure of amiloride 
micro-urokinase was completed at 2.2 A resolution. 

Examination of the 2F^ - (contoured at 1 <r) and - F^ 
(contoured at 3 a) electron density maps at the active site 
shows that all atoms of the inhibitor are clearly defined in both 
maps (Fig. 4A), In addition, the maps show His®^ in two con- 
formations and the Cys*^-Cys^^ disulfide bridge intact as ob- 
served in the B428 complex. The data also indicate that amilo- 
ride binds at the Sj pocket as observed with B428 (Fig, 4C). 

The crystal structure of amiloride-micro-urokinase indicates 
that amiloride is making more hydrogen bonding interactions 
at the Si site than B428 while maintaining some of the van der 
Waals' interactions within the pocket. The size of the amiloride 
pyrazine scaffold is smaller than the B428 benzothiophene 
such that even though the pyrazine ring is in contact with the 
rim of the Si pocket as observed for B428, the extent of the 
packing interactions is smaller. In place of the thiophene ring, 
the 3-amino and 2-acylguanidine groups of amiloride are mak- 
ing hydrogen bonding interactions. Specifically, the 3-amino 
group is packed underneath the side chain of Ser^®^ as shown 
in Fig. 4B where it is donating a hydrogen bond to Ser^^^-Oy 
(3.1 A). The carbonyl of the acyl guanindine group is accepting 
a hydrogen bond (2.9 A) from a buried solvent molecule bound 
directly above Tyr^^®. The guanidine-NH is donating a hydro- 
gen bond to Gly^^^-O (3.1 A). As observed with B428, the 
amide-like nitrogens are donating hydrogen bonds to Gly^^^-O 
(2.7 A) and Asp^®^-O51(3.0 A) or to Asp^^®-052 (3.0 A), and 
Ser^^**-Oy (2.7 A). The hydrogen bonding geometry of the gua- 
nidinium group is also very similar to that observed for ArgP^ 
in the Glu-Gly-Arg-chloromethyl ketone-LMW urokinase struc- 
ture (1). Hence, although the core scaffolds of both B428 and 
amiloride are bound at the S, pocket, the nature of the inter- 
actions within the pocket are different. 

The crystal structure of amiloride-micro-urokinase reveals 
strategies for structure-based drug design of a more potent 
small molecule inhibitor. One potential site of substitution is 
the 6 position. The 6-chloro group of amiloride is accessing the 
SijS pocket as observed for the 4-iodo group of B428. Specifi- 
cally the 6-chloro group is in hydrophobic contact with the side 
chain of Cys^^** and the main chain atoms of Gly^^^ (Fig. 4C). 
Thus, although the chemical structures of B428 and amiloride 
are very different, interactions at the S^/S pocket are nearly 
identical. Because of this similarity, one might substitute the 
6-chloro position of amiloride with larger groups such as iodine 
(present in B428) or a benzodioxol arylethenyl (present in 
B623), which were both shown to enhance the activity in the 
benzo(b)thiophene-2-carboxamidine series. The 3 position of 
amiloride within the S, pocket is another site for substitution. 
However, substitutions at this site are expected to point toward 
Gln^^^ and then out toward bulk solvent as observed for the 5 
position of B428. Thus, use of a rigid linker may be necessary to 
redirect substitutions toward the protein including the sub- 
strate-binding groove. In summary, substitutions of the amilo- 
ride scaffold should occur at the 5 and 6 positions to provide 
direct access to the S^/S pocket or indirect access to other sites 
on the protein. 

Fhenylguanidine — Phenylguanidine inhibits urokinase with 
a of 20.6 /i-M (27) and is therefore a weaker inhibitor of 
urokinase than either amiloride or B428 (Table II). This inhib- 



itor also competitively inhibits micro-urokinase with a con- 
sistent with the LMW form (iC, = 20.6 ^lM LMW for urokinase, 
and Ki = 17.4 ^lm for micro-urokinase). To compare the binding 
mode of this inhibitor to amiloride and B428 and to determine 
potential sites of substitution, the co-crystal structure of phe- 
nylguanidine-micro-urokinase was completed at 2.0 A 
resolution. 

The phenylguanidine-micro-urokinase active site structure 
is very similar to that in the presence of B428 and amiloride. 
His^® is observed in multiple conformations while the Cys'*^— 
Cys^® disulfide bridge is intact. Additionally, the 2F^ - F^ 
(contoured at 1 a) and F^ — F^ (contoured at 3 cr) electron 
density maps (Fig. 5A) obtained using the urokinase model in 
the absence of inhibitor and before any refinement cycles shows 
that all atoms of the inhibitor are clearly defined in both maps. 
The inhibitor was found to bind at the S^ pocket (Fig. 5J5). 

Even though both amiloride and phenylguanidine have scaf- 
folds of the same size, the phenyl ring of phenylguanidine binds 
very differently from the pyrazine ring of amiloride (Fig. 5, B 
and C). Specifically, the phenylguanidine ring packs under- 
neath Ser^^^ and is interacting with the main chain atoms of 
VaP^^-Trp=^^^ as well as the side chain of Val^^^. The ring also 
interacts with the main chain atoms of Ser^^°-Cys^®^ as well as 
the side chain of Ser^^°. The differential ring packing is most 
likely due to amiloride possessing one additional linker atom 
between the guanidine and aromatic groups relative to phenyl- 
guanidine (Table II) because the guanidine groups are oriented 
very similarly. Specifically, the guanidine-NH is donating a 
hydrogen bond to Gly^'®-0 (3.0 A), whereas the amidine-like 
nitrogens are donating hydrogen bonds to Gly^^®-0 (2.9 A) and 
Asp'«^-061 (2.9 A) or to Asp^«^-052 (3.0 A) and Ser^^^-Ov (3.3 
A). Thus, it is likely that the core scaffold of amiloride (pyrazine 
ring) orients differently than the phenyl group of phenylguani- 
dine because the binding is being driven by the hydrogen bond- 
ing geometry of the guanidine groups rather than the van der 
Waals'/hydrogen bonding interactions of the core groups even 
though interactions of the core groups most certainly contrib- 
ute to the compound binding. 

The phenyl guanidine urokinase structure also shows that 
Gln^^*"^ has changed conformation and is in hydrophobic contact 
with the inhibitor (Fig. 5B) such that it is blocking the entrance 
to the S,/3 pocket. In the native and the B428 or amiloride 
complex structures, the Sii3 pocket is open where Gln^®^ is 
accepting a hydrogen bond from Lys^"*^ (3.3 A) and donating a 
hydrogen bond to Tyr^^^ (3.1 A). Thus, a conformational shift of 
this side chain requires breaking two hydrogen bonds. This is 
not the case for other serine proteases such as thrombin where 
there is no hydrogen bonding partner for Glu^^^ in either posi- 
tion. Here, there is less of an energy barrier to a conformational 
shift of Glu^^^, and the side chain may be found in both con- 
formations (49, 50). For urokinase, it appears that the binding 
of certain inhibitors such as phenyl guanidine does break the 
two Gln^^^ hydrogen bonds and conformationally shift Gln^^^ 
to maximize hydrophobic desolvation of the compound. Hence, 
Gln*^^ may be induced to shift conformation and because 
Gln^®**^ may act as a switch to the entrance to Si)3 from Si, 
noting the orientation of this side chain is important in a drug 
design strategy. 

The crystal structure of phenylguanidine-urokinase suggests 
a structure-based drug design strategy different from that with 
B428 or amiloride. Both B428 and amiloride are capable of 
directly accessing the Si/3 pocket, whereas the binding orien- 
tation of phenylguanidine is such that a similar interaction 
cannot be achieved by direct substitution of the phenyl ring 
(Fig. 5C) even with movement of Glnl92 to the S^^ open posi- 
tion. Specifically, as shown in Fig. 5 (B and C), the 2 and 3 
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Fig 5 A initial 2F - (purple) and - (^reen) maps contoured at 1 and 3 respectively, for the binding site of phenyl guamdine before 
refinement: B mo^^^^^^ surface micro-urokinale as calculated by the program package QUANTA (Molecular Simulations Inc ) depictmg 
interactions between B428 and micro-urokinase. The inhibitor and inhibitor surface are shown in orange whereas P^^^em an^ protem s^^^^ 
are shown in cyan. C, overlay of the crystal structures of amiloride {purple) and phenyl guanidme (black) micro-urokinase, showing that the two 
scaffolds occupy different areas of the Si pocket. 



positions could point tow^ard the Si/3 pocket but are too far 
away to support direct interaction with SijS. In fact, substitu- 
tion of the phenyl ring with halogens at both the 2 and 3 
positions did not result in any increase in inhibitory potency 
(27). On the other hand, substitution at position 4 with a 
chloro- or trifluromethyl-group resulted in an increase in inhi- 
bition to Ki values of 6.8 and 6.5 /xm, respectively (27). This 4 
substitution is expected to orient toward the side chain of 
Ser^^^ and may obtain binding energy from a favorable van der 
Waals' packing interaction with Ser^^^ and the Si pocket. The 
5 and 6 positions are within the Si pocket and therefore less 
open for substitution. Because interactions with the Si/3 pocket 
are expected to confer an increase in binding potency and 
because phenylguanidine may not directly access this site, 
modification of the scaffold may be a promising drug design 
strategy for this series. 

Further examination of an overlay of the crystal structures of 
phenyl guanidine and amiloride micro-urokinase CFig. 5C) 
shows that the binding of the two scaffolds is complementary. 
The lack of overlap between the two groups suggests that the 
phenyl and pyrazine rings could be fused to form a 1- naphth- 



ylguanidine system. The naphthyl ring would be expected to 
occupy the sites of both core scaffolds and could therefore 
maintain the positive characteristics of both the phenylguani- 
dine and amiloride series. This would include utihzation of the 
4-chloro or 4-trifluromethyl substitutions in the phenylguani- 
dine series as well as access to the S^p pocket exploited by 
amiloride and B428. Hence, a merging of the amiloride and 
phenylguanidine scaffolds would be predicted to benefit from 
the additivity of both sites and create a more potent and easily 
optimized urokinase inhibitor. 

DISCUSSION 

Urokinase inhibitors have been shown to affect tumor me- 
tastasis and growth in vivo making urokinase an attractive 
anti-cancer target. However, these existing compounds lack all 
of the properties necessary for a therapeutic agent and require 
optimization. Crystallography driven structure-based drug de- 
sign based on a series of ligand-protein crystal structures can 
be utilized to optimize urokinase inhibition. The properties of 
the protein crystals can affect the efficiency of structure-based 
drug design because a larger number of more accurate struc- 
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tures provides a better description of the relationship between 
binding interactions and binding energy. Fortunately, ad- 
vances in molecular biology can be used to engineer the protein 
to obtain crystal systems that facilitate faster and more exact 
structure determinations and enhance the drug design cycle 
(47). Such a method has been used to design a crystal system 
for human urokinase for optimization of a urokinase inhibitor. 

The sequence of LMW urokinase was redesigned to produce 
a new crystal form that would permit a more ideal system for 
structure-based drug design. Specifically. LMW urokinase was 
re-engineered to minimize the areas of disorder that may likely 
cause suboptimal crystal packing. This recombinant protein, 
micro-urokinase, produces crystals with close packing interac- 
tions at the A-chain cleft, which would be blocked in LMW 
urokinase. This close molecular packing results in crystals that 
diffract to high resolution on a rotating anode source (1.6-2.0 
A). However, even though the micro-urokinase molecules are 
closely packed, the active site is both unoccupied and open to 
solvent channels in the crystal. This property readily allows 
compounds to be diffused into the crystal and has facilitated 
the determination of crystal structures in the presence of three 
reported urokinase inhibitors toward design of an anti-cancer 
agent. 

The micro-urokinase crystal system and soaking method was 
used to determine the co-crystal structures of micro-urokinase 
complexed with the inhibitors B428 (25, 26), amiloride (24), 
and phenylguanidine (27). Each of the co-crystal structures 
gives insight into favorable compound-protein interactions that 
contribute to the binding of these inhibitors to urokinase. The 
primary binding force is likely the hydrogen bonds between 
each inhibitor's amidine or guanidine group and Asp^^^. This 
salt bridge interaction is common to many guanidine or ami- 
dine complexes with trypsin or trypsin-like serine proteases 
such as thrombin, factor Xa, or tissue plasminogen activator 
(41-45) and is observed for Arg-Pi in the Glu-Gly-Arg-chlorom- 
ethyl ketone LMW urokinase structure (1). In addition to the 
hydrogen bonding interactions, van der Waals' packing be- 
tween the core scaffold and the Sj pocket may also contribute to 
the overall binding energy. Hydrophobic packing at the S, 
pocket is the primary binding interaction between substrates/ 
inhibitors in the chymotrypsin family of proteases where the 
Si pocket contains no charged groups (48-51). Additionally, a 
series of thrombin inhibitors that lack a positively charged 
group to interact with Asp^^^ have been described (52, 53). 
Hence, both hydrophilic and hydrophobic interactions at the 
Si pocket contribute to the binding of B428, amiloride, and 
phenylguanidine, and these interactions are present in other 
crystal structures. 

Examination of the urokinase structures reveals a new ad- 
ditional binding site adjacent to the Sj pocket. The site, termed 
the Si/3 subpocket, is composed of the disulfide bridge at 
Cys*^^-Cys^^", residues Ser''**^ and Gly^**^, and the side chain of 
Lys^^"*. The S^jS subpocket is also present in the LMW uroki- 
nase structure (Protein Data Bank entry ILMW) and is away 
from any re-engineered sites. The crystal structure of phenyl 
guanidine urokinase reveals that Gln^*'*'^ may act as a switch for 
the closing and opening of S^p. In the native and B428 or 
amiloride complex structures, the Si/3 pocket is open, and 
Gln^^^ is involved in two hydrogen bonds (Lys^"^^ and Tyr^^M. 
However, in the presence of other inhibitors such as phenyl 
guanidine or Glu-Gly-Arg-chloromethyl ketone (1), the hydro- 
gen bonds are broken, and the conformation of Gln^®^ shift;ed 
such that its side chain is in van der Waals' contact with the 
inhibitor. In this conformation, the entrance to Sj/S is blocked, 
and the shift is most likely induced to maximize interactions 
with the inhibitor. Hence, although the Si/3 pocket may be 



blocked by the induced movement of Gln^®^, its proximity to Sj 
makes it an attractive subsite for structure-based drug design. 
The halogen atoms of B428 and amiloride are interacting 
with the entrance to the Sj/S subsite (Gly^^®-Cys^^°). Interac- 
tions at this site have been shown to confer a significant in- 
crease in inhibitory potency for the benzo(b)thiophene-2-car- 
boxamidine series where the 4-iodo group (ICso — 0-32 ^) or 
4-benzodioxolanyletheyl (IC50 = 0.07 /xm) inhibit more strongly 
than the 4-hydro compound (IC50 = 3.7 ftw) (25, 26). The 
increase in potency observed for both substitutions is most 
likely due to packing interactions at the S^jS pocket. Phenyl- 
guanidine lacks a halogen atom to access the Si/3 pocket, and 
examination of the structure reveals that the pocket can not be 
easily accessed by a direct substitution of the phenylguanidine 
ring. However, an overlay of the phenylguanidine crystal struc- 
ture with that of amiloride reveals that the two scaffolds could 
be merged to form a 1-guanadyl naphthalene. This compound 
could, in turn, access the S^jS pocket. Hence, urokinase co- 
crystal structures with B428, amiloride, and phenylguanidine 
indicate that all three scaffolds may provide either direct or 
indirect access to the Si/3 pocket. Furthermore, this newly 
described subsite has great potential for the future design of 
more potent urokinase inhibitors for the treatment of cancer. 
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TRANSMEMBRANE SERINE PROTEASE OVEREXPRESSED IN 
5 OVARIAN CARCINOMA AND USES THEREOF 

BACKGROUND OF THE INVENTION 

1 0 Cross-Reference to Related Application 

This application is a continuation-in-part patent 
application and claims the benefit of priority under 35 USC§120 
of USSN 09/261,416, filed March 3, 1999. 

1 5 Field of the Invention 

The present invention relates generally to the fields of 
cellular biology and diagnosis of neoplastic disease. More 
specifically, the present invention relates to a transmembrane 
serine protease termed Tumor Associated Differentially-Expressed 

20 Gene- 12 (TADG-12), which is overexpressed in ovarian carcinoma. 

Description of the Related Art 

Tumor cells rely on the expression of a concert of 
proteases to be released from their primary sites and move to 
25 distant sites to inflict lethality. This metastatic nature is the result 
of an aberrant expression pattern of proteases by tumor cells and 
also by stromal cells surrounding the tumors [1-3], For most 
tumors to become metastatic, they must degrade their 
surrounding extracellular matrix components, degrade basement 

1 



wo 00/52044 PCT/USOO/05612 

membranes to gain access to the bloodstream or lymph system, 
and repeat this process in reverse fashion to settle in a secondary 
host site [3-6]. All of these processes rely upon what now appears 
to be a synchronized protease cascade. In addition, tumor cells 
5 use the power of proteases to activate growth and angiogenic 
factors that allow the tumor to grow progressively [1]. Therefore, 
much research has been aimed at the identification of tumor- 
associated proteases and the inhibition of these enzymes for 
therapeutic means. More importantly, the secreted nature and/or 
10 high level expression of many of these proteases allows for their 
detection at aberrant levels in patient serum, e.g. the prostate- 
specific antigen (PSA), which allows for early diagnosis of prostate 
cancer [7]. 

Proteases have been associated directly with tumor 
15 growth, shedding of tumor cells and invasion of target organs. 
Individual classes of proteases are involved in, but not limited to 
(1) the digestion of stroma surrounding the initial tumor area, (2) 
the digestion of the cellular adhesion molecules to allow 
dissociation of tumor cells; and (3) the invasion of the basement 
20 membrane for metastatic growth and the activation of both tumor 
growth factors and angiogenic factors. 

For many forms of cancer, diagnosis and treatment has 
improved dramatically in the last 10 years. However, the five 
year survival rate for ovarian cancer remains below 50% due in 
25 large part to the vague symptoms which allow for progression of 
the disease to an advanced stage prior to diagnosis [8]. Although 
the exploitation of the CA125 antigen has been useful as a marker 
for monitoring recurrence of ovarian cancer, it has not proven to 
be an ideal marker for early diagnosis. Therefore, new markers 
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that may be secreted or released from cells and which are highly 
expressed by ovarian tumors could provide a useful tool for the 
early diagnosis and for therapeutic intervention in patients with 
ovarian carcinoma. 
5 The prior art is deficient in the lack of the complete 

identification of the proteases overexpressed in carcinoma, 
therefore, deficient in the lack of a tumor marker useful as an 
indicator of early disease, particularly for ovarian cancers. 
Specifically, TADG-12, a transmembrane serine protease, has not 
10 been previously identified in either nucleic acid or protein form. 
The present invention fulfills this long-standing need and desire 
in the art, 

SUMMARY OF THE INVENTION 

15 

The present invention discloses TADG-12, a new 
member of the Tumor Associated Differentially-Expressed Gene 
(TADG) family, and a variant splicing form of TADG-12 (TADG- 
12 V) that could lead to a truncated protein product. TADG-12 is a 

20 transmembrane serine protease overexpressed in ovarian 
carcinoma. The entire cDNA of TADG-12 has been identified (SEQ 
ID No. 1). This sequence encodes a putative protein of 454 amino 
acids (SEQ ID No. 2) which includes a potential transmembrane 
domain, an LDL receptor like domain, a scavenger receptor 

25 cysteine rich domain, and a serine protease domain. These 
features imply that TADG-12 is expressed at the cell surface, and 
it may be used as a molecular target for therapy or a diagnostic 
marker. 
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In one embodiment of the present invention, there is 
provided a DNA fragment encoding a TADG-12 protein selected 
from the group consisting of: (a) an isolated DNA fragment which 
encodes a TADG-12 protein; (b) an isolated DNA fragment which 
5 hybridizes to isolated DNA fragment of (a) above and which 
encodes a TADG-12 protein; and (c) an isolated DNA fragment 
differing from the isolated DNA fragments of (a) and (b) above in 
codon sequence due to the degeneracy of the genetic code, and 
which encodes a TADG-12 protein. Specifically, the DNA fragment 

1 0 has a sequence shown in SEQ ID No. 1 or SEQ ID No. 3. 

In another embodiment of the present invention, there 
is provided a vector/host cell capable of expressing the DNA of the 
present invention. 

In yet another embodiment of the present invention, 

15 there is provided an isolated and purified TADG-12 protein 
encoded by DNA selected from the group consisting of: (a) isolated 
DNA which encodes a TADG-12 protein; (b) isolated DNA which 
hybridizes to isolated DNA of (a) above and which encodes a 
TADG-12 protein; and (c) isolated DNA differing from the isolated 

20 DNAs of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG-12 
protein. Specifically, the TADG-12 protein has an amino acid 
sequence shown in SEQ ID No. 2 or SEQ ID No. 4. 

In still yet another embodiment of the present 

25 invention, there is provided a method for detecting expression of a 
TADG-12 protein, comprising the steps of: (a) contacting mRNA 
obtained from the cell with the labeled hybridization probe; and 
(b) detecting hybridization of the probe with the mRNA. 
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The present invention further provides methods for 
diagnosing a cancer or other malignant hyperplasia by detecting 
the TADG-12 protein or mRNA disclosed herein. 

In still another embodiment of the present invention, 
5 there is provided a method of inhibiting expression of endogenous 
TADG-12 mRNA in a cell by introducing a vector into the cell, 
wherein the vector comprises a DNA fragment of TADG-12 in 
opposite orientation operably linked to elements necessary for 
expression. 

10 In still yet another embodiment of the present 

invention, there is provided a method of inhibiting expression of a 
TADG-12 protein in a cell by introducing an antibody directed 
against a TADG-12 protein or fragment thereof. 

In still yet another embodiment of the present 

15 invention, there is provided a method of targeted therapy by 
administering a compound having a targeting moiety specific for a 
TADG-12 protein and a therapeutic moiety. Specifically, the 
TADG-12 protein has an amino acid sequence shown in SEQ ID No. 
2 or SEQ ID No. 4, 

20 The present invention still further provides a method 

of vaccinating an individual against TADG-12 by inoculating the 
individual with a TADG-12 protein or fragment thereof. 
Specifically, the TADG-12 protein has an amino acid sequence 
shown in SEQ ID No. 2 or SEQ ID No. 4. The TADG-12 fragment 

25 includes the truncated form of TADG-12V peptide having a 
sequence shown in SEQ ID No. 8, and a 9-residue up to 12-residue 
fragment of TADG-12 protein. 

In yet another embodiment of the present invention, 
there is provided an immunogenic composition, comprising an 
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immunogenic fragment of a TADG-12 protein and an appropriate 
adjuvant. The TADG-12 fragment includes the truncated form of 
TADG-12V peptide having a sequence shown in SEQ ID No. 8, and a 
9-residue up to 12-residue fragment of TADG-12 protein. 
5 Other and further aspects, features, and advantages of 

the present invention will be apparent from the following 
description of the presently preferred embodiments of the 
invention given for the purpose of disclosure. 

1 0 BRIEF DESCRIPTION OF THE DRAWINGS 

So that the matter in which the above-recited features, 
advantages and objects of the invention, as well as others which 
will become clear, are attained and can be understood in detail, 

15 more particular descriptions of the invention briefly summarized 
above may be had by reference to certain embodiments thereof 
which are illustrated in the appended drawings. These drawings 
form a part of the specification. It is to be noted, however, that 
the appended drawings illustrate preferred embodiments of the 

20 invention and therefore are not to be considered limiting in their 
scope. 

Figure lA shows that the expected PGR product of 
approximately 180 bp and the unexpected PGR product of 
approximately 300 bp using the redundant serine protease 
25 primers were not amplified from normal ovary cDNA (Lane 1) but 
were found in abundance from ovarian tumor cDNA (Lane 2). The 
primer sequences for the PGR reactions are indicated by horizontal 
arrows. Figure IB shows that TADG-12 was subcloned from the 
180 bp band while the larger 300 bp band was designated TADG- 
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12V. The sequences were found to overlap for 180 bp (SEQ ED No. 
5 for nucleotide sequence, SEQ ID No. 6 for deduced amino acid 
sequence) with the 300 bp TADG-12V (SEQ ID No. 7 for nucleotide 
sequence, SEQ ID No. 8 for deduced amino acid sequence) having 
5 an additional insert of 133 bases. This insertion (vertical arrow) 
leads to a frame shift, which causes the TADG-12V transcript to 
potentially produce a truncated form of TADG-12 with a variant 
amino acid sequence. 

Figure 2 shows that Northern blot analysis for TADG- 

10 12 revealed three transcripts of 2.4, 1.6 and 0.7 kilobases. These 
transcripts were found at significant levels in ovarian tumors and 
cancer cell lines, but the transcripts were found only at low levels 
in normal ovary. 

Figure 3 shows an RNA dot blot (CLONTECH) probed 

15 for TADG-12. The transcript was detectable (at background 
levels) in all 50 of the human tissues represented with the 
greatest abundance of transcript in the heart. Putamen, amygdala, 
kidney, liver, small intestine, skeletal muscle, and adrenal gland 
were also found to have intermediate levels of TADG-12 

20 transcript. 

Figure 4 shows the entire cDNA sequence for TADG- 
12 (SEQ ID No. 1) with its predicted open reading frame of 45 4 
amino acids (SEQ ID No. 2). Within the nucleotide sequence, the 
Kozak's consensus sequence for the initiation of translation and 
25 the poly-adenylation signal are underlined. In the protein 
sequence, a potential transmembrane domain is boxed. The LDLR- 
A domain is underlined with a solid line. The SRCR domain is 
underlined with a broken line. The residues of the catalytic triad 
of the serine protease domain are circled, and the beginning of the 
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catalytic domain is marked with an arrow designated as a 
potential proteolytic cleavage site. The * represents the stop 
codon that terminates translation. 

Figure 5 A shows the 35 amino acid LDLR-A domain 
5 of TADG-12 (SEQ ID No. 13) aligned with other LDLR-A motifs 
from the serine protease TMPRSS2 (U75329, SEQ ID No. 14), the 
complement subunit C8 (P07358, SEQ ID No. 9), two LDLR-A 
domains of the glycoprotein GP300 (P98164, SEQ ID Nos. 11-12), 
and the serine protease matriptase (AFl 18224, SEQ ID No. 10). 

10 TADG-12 has its highest similarity with the other serine proteases 
for which it is 54% similar to TMPRSS2 and 53% similar to 
matriptase. The highly conserved cysteine residues are shown in 
bold type. Figure 5B shows the SRCR domain of TADG-12 (SEQ ID 
No. 17) aligned with other domain family members including the 

15 human macrophage scavenger receptor (P21757, SEQ ID No. 16), 
human enterokinase (P98073, SEQ ID No, 19), bovine enterokinase 
(P21758, SEQ ID No. 15), and the serine protease TMPRSS2 (SEQ ID 
No. 18). Again, TADG-12 shows its highest similarity within this 
region to the protease TMPRSS2 at 43%. Figure 5C shows the 

20 protease domain of TADG-12 (SEQ ID No, 23) in alignment with 
other human serine proteases including protease M (U62801, SEQ 
ID No. 20), trypsinogen I (P07477, SEQ ID No. 21), plasma 
kallikrein (P03952, SEQ ID No. 22), hepsin (P05981, SEQ ID No. 25), 
and TMPRSS2 (SEQ ID No. 24). Cons represents the consensus 

25 sequence for each alignment. 

Figure 6 shows semi-quantitative PGR analysis that 
was performed for TADG-12 (upper panel) and TADG-12V (lower 
panel). The amplification of TADG-12 or TADG-12V was 
performed in parallel with PGR amplification of p-tubulin product 
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as an internal control. The TADG-12 transcript was found to be 
overexpressed in 41 of 55 carcinomas. The TADG-12V transcript 
was found to be overexpressed in 8 of 22 carcinomas examined. 
Note that the samples in the upper panel are not necessarily the 
5 same as the samples in the lower panel. 

Figure 7 shows immunohistochemical staining of 
normal ovary and ovarian tumors which were performed using a 
polyclonal rabbit antibody developed to a TADG-12 specific 
peptide. No significant staining was detected in normal ovary 

10 (Figure 7A). Strong positive staining was observed in 22 of 2 9 
carcinomas examined. Figures 7B and 7C represent a serous and 
mucinous carcinoma, respectively. Both show diffuse staining 
throughout the cytoplasm of tumor cells while stromal cells 
remain relatively unstained. 

15 Figure 8 is a model to demonstrate the progression of 

TADG-12 within a cellular context. In normal circumstances, the 
TADG-12 transcript is appropriately spliced and the resulting 
protein is capable of being expressed at the cell surface where the 
protease may be cleaved to an active form. The role of the 

20 remaining ligand binding domains has not yet been determined, 
but one can envision their potential to bind other molecules for 
activation, internalization or both. The TADG-12V transcript, 
which occurs in some tumors, may be the result of mutation 
and/or poor mRNA processing may be capable of producing a 

25 truncated form of TADG-12 that does not have a functional 
protease domain. In addition, this truncated product may present 
a novel epitope at the surface of tumor cells. 
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DETAILED DESCRIPTION OF THE INVENTION 



To examine the serine proteases expressed by ovarian 
cancers, a PCR based differential display technique was employed 
5 utilizing redundant PCR primers designed to the most highly 
conserved amino acids in these proteins [9]. As a result, a novel 
cell-surface, multi-domain serine protease, named Tumor 
Associated Differentially-expressed Gene-12 (TADG-12) was 
identified. TADG-12 appears to be overexpressed in many ovarian 

10 tumors. The extracellular nature of TADG-12 may render tumors 
susceptible to detection via a TADG-12 specific assay. In addition, 
a splicing variant of TADG-12, named TADG-12V, was detected at 
elevated levels in 35% of the tumors that were examined. TADG- 
12V encodes a truncated form of TADG-12 with an altered amino 

15 acid sequence that may be a unique tumor specific target for 
future therapeutic approaches. 

The TADG-12 cDNA is 2413 base pairs long (SEQ ID No. 
1) encoding a 454 amino acid protein (SEQ ID No. 2). A variant 
form, TADG-12V (SEQ ID No. 3), encodes a 294 amino acid protein 

20 (SEQ ID No. 4). The availability of the TADG-12 and/or TADG-12V 
gene opens the way for a number studies that can lead to various 
applications. For example, the TADG-12 and/or TADG-12V gene 
can be used as a diagnostic or therapeutic target in ovarian 
carcinoma and other carcinomas including breast, prostate, lung 

25 and colon. 

In accordance with the present invention there may be 
employed conventional molecular biology, microbiology, and 
recombinant DNA techniques within the skill of the art. Such 
techniques are explained fully in the literature. See, e.g., Maniatis, 
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Fritsch & Sambrook, "Molecular Cloning: A Laboratory Manual 
(1982); "DNA Cloning: A Practical Approach," Volumes I and II 
(D.N. Glover ed. 1985); "Oligonucleotide Synthesis" (M.J. Gait ed. 
1984); "Nucleic Acid Hybridization" [B.D. Hames & S.J. Higgins eds. 
5 (1985)]; "Transcription and Translation" [B.D. Hames & S.J. Higgins 
eds. (1984)]; "Animal Cell Culture" [R.I. Freshney, ed. (1986)]; 
"Immobilized Cells And Enzymes" [IRL Press, (1986)]; B. Perbal, "A 
Practical Guide To Molecular Cloning" (1984). 

Therefore, if appearing herein, the following terms 
10 shall have the definitions set out below. 

As used herein, the term "cDNA" shall refer to the DNA 
copy of the mRNA transcript of a gene. 

As used herein, the term "derived amino acid 
sequence" shall mean the amino acid sequence determined b y 
15 reading the triplet sequence of nucleotide bases in the cDNA. 

As used herein the term "screening a library" shall 
refer to the process of using a labeled probe to check whether, 
under the appropriate conditions, there is a sequence 
complementary to the probe present in a particular DNA library. 
20 In addition, "screening a library" could be performed by PCR. 

As used herein, the term "PCR" refers to the 
polymerase chain reaction that is the subject of U.S. Patent Nos. 
4,683,195 and 4,683,202 to Mullis, as well as other improvements 
now known in the art. 
25 The amino acid described herein are preferred to be in 

the "L" isomeric form. However, residues in the "D" isomeric form 
can be substituted for any L-amino acid residue, as long as the 
desired functional property of immunoglobulin-binding is retained 
by the polypeptide. NH2 refers to the free amino group present at 
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the amino terminus of a polypeptide. CXX)H refers to the free 
carboxy group present at the carboxy terminus of a polypeptide. 
In keeping with standard polypeptide nomenclature, J BioL Chem., 
243:3552-59 (1969), abbreviations for amino acid residues are 
5 known in the art. 

It should be noted that all amino-acid residue 
sequences are represented herein by formulae whose left and 
right orientation is in the conventional direction of amino - 
terminus to carboxy-terminus. Furthermore, it should be noted 

10 that a dash at the beginning or end of an amino acid residue 
sequence indicates a peptide bond to a further sequence of one or 
more amino-acid residues. 

A "replicon" is any genetic element (e.g., plasmid, 
chromosome, virus) that functions as an autonomous unit of DNA 

15 replication in vivo\ i.e., capable of replication under its own 
control. 

A "vector" is a replicon, such as plasmid, phage or 
cosmid, to which another DNA segment may be attached so as to 
bring about the replication of the attached segment. 

20 A "DNA molecule" refers to the polymeric form of 

deoxyribonucleotides (adenine, guanine, thymine, or cytosine) in 
its either single stranded form, or a double-stranded helix. This 
term refers only to the primary and secondary structure of the 
molecule, and does not limit it to any particular tertiary forms. 

25 Thus, this term includes double-stranded DNA found, inter alia, in 
linear DNA molecules (e.g., restriction fragments), viruses, 
plasmids, and chromosomes. In discussing the structure herein 
according to the normal convention of giving only the sequence in 
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the 5* to 3* direction along the nontranscribed strand of DNA (i.e., 
the strand having a sequence homologous to the mRNA). 

An "origin of replication" refers to those DNA 
sequences that participate in DNA synthesis. 
5 A DNA "coding sequence" is a double-stranded DNA 

sequence which is transcribed and translated into a polypeptide in 
vivo when placed under the control of appropriate regulatory 
sequences. The boundaries of the coding sequence are determined 
by a start codon at the 5* (amino) terminus and a translation stop 

10 codon at the 3* (carboxyl) terminus. A coding sequence can 
include, but is not limited to, prokaryotic sequences, cDNA from 
eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., 
mammalian) DNA, and even synthetic DNA sequences. A 
polyadenylation signal and transcription termination sequence 

15 will usually be located 3* to the coding sequence. 

Transcriptional and translational control sequences are 
DNA regulatory sequences, such as promoters, enhancers, 
polyadenylation signals, terminators, and the like, that provide for 
the expression of a coding sequence in a host cell. 

20 A "promoter sequence" is a DNA regulatory region 

capable of binding RNA polymerase in a cell and initiating 
transcription of a downstream (3' direction) coding sequence. For 
purposes of defining the present invention, the promoter sequence 
is bounded at its 3' terminus by the transcription initiation site 

25 and extends upstream (5' direction) to include the minimum 
number of bases or elements necessary to initiate transcription at 
levels detectable above background. Within the promoter 
sequence will be found a transcription initiation site, as well as 
protein binding domains (consensus sequences) responsible for 
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the binding of RNA polymerase. Eukaryotic promoters often, but 
not always, contain "TATA" boxes and "CAT" boxes. Prokaryotic 
promoters contain Shine-Dalgarno sequences in addition to the - 1 0 
and -35 consensus sequences. 
5 An "expression control sequence" is a DNA sequence 

that controls and regulates the transcription and translation of 
another DNA sequence. A coding sequence is "under the control" 
of transcriptional and translational control sequences in a cell 
when RNA polymerase transcribes the coding sequence into 

10 mRNA, which is then translated into the protein encoded by the 
coding sequence. 

A "signal sequence" can be included near the coding 
sequence. This sequence encodes a signal peptide, N-terminal to 
the polypeptide, that communicates to the host cell to direct the 

15 polypeptide to the cell surface or secrete the polypeptide into the 
media, and this signal peptide is clipped off by the host cell before 
the protein leaves the cell. Signal sequences can be found 
associated with a variety of proteins native to prokaryotes and 
eukaryotes. 

20 The term "oligonucleotide", as used herein in referring 

to the probe of the present invention, is defined as a molecule 
comprised of two or more ribonucleotides, preferably more than 
three. Its exact size will depend upon many factors which, in turn, 
depend upon the ultimate function and use of the oligonucleotide. 

25 The term "primer" as used herein refers to an 

oligonucleotide, whether occurring naturally as in a purified 
restriction digest or produced synthetically, which is capable of 
acting as a point of initiation of synthesis when placed under 
conditions in which synthesis of a primer extension product, which 
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is complementary to a nucleic acid strand, is induced, i.e., in the 
presence of nucleotides and an inducing agent such as a DNA 
polymerase and at a suitable temperature and pH. The primer 
may be either single-stranded or double-stranded and must be 
5 sufficiently long to prime the synthesis of the desired extension 
product in the presence of the inducing agent. The exact length of 
the primer will depend upon many factors, including temperature, 
source of primer and use the method. For example, for diagnostic 
applications, depending on the complexity of the target sequence, 

10 the oligonucleotide primer typically contains 15-25 or more 
nucleotides, although it may contain fewer nucleotides. 

The primers herein are selected to be "substantially" 
complementary to different strands of a particular target DNA 
sequence. This means that the primers must be sufficiently 

15 complementary to hybridize with their respective strands. 
Therefore, the primer sequence need not reflect the exact 
sequence of the template. For example, a non-complementary 
nucleotide fragment may be attached to the 5' end of the primer, 
with the remainder of the primer sequence being complementary 

20 to the strand. Alternatively, non-complementary bases or longer 
sequences can be interspersed into the primer, provided that the 
primer sequence has sufficient complementary with the sequence 
or hybridize therewith and thereby form the template for the 
synthesis of the extension product. 

25 As used herein, the terms "restriction endonucleases" 

and "restriction enzymes" refer to enzymes, each of which cut 
double-stranded DNA at or near a specific nucleotide sequence. 

A cell has been "transformed" by exogenous or 
heterologous DNA when such DNA has been introduced inside the 
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cell. The transforming DNA may or may not be integrated 
(covalently linked) into the genome of the cell. In prokaryotes, 
yeast, and mammalian cells for example, the transforming DNA 
may be maintained on an episomal element such as a plasmid. 
5 With respect to eukaryotic cells, a stably transformed cell is one in 
which the transforming DNA has become integrated into a 
chromosome so that it is inherited by daughter cells through 
chromosome replication. This stability is demonstrated by the 
ability of the eukaryotic cell to establish cell lines or clones 
10 comprised of a population of daughter cells containing the 
transforming DNA. A "clone" is a population of cells derived from 
a single cell or ancestor by mitosis, A "cell line" is a clone of a 
primary cell that is capable of stable growth in vitro for many 
generations. 

15 Two DNA sequences are "substantially homologous" 

when at least about 75% (preferably at least about 80%, and most 
preferably at least about 90% or 95%) of the nucleotides match 
over the defined length of the DNA sequences. Sequences that are 
substantially homologous can be identified by comparing the 

20 sequences using standard software available in sequence data 
banks, or in a Southern hybridization experiment under, for 
example, stringent conditions as defined for that particular 
system. Defining appropriate hybridization conditions is within 
the skill of the art. See, e.g., Maniatis et aL, supra; DNA Cloning, 

25 Vols. I & II, supra; Nucleic Acid Hybridization, supra. 

A "heterologous" region of the DNA construct is an 
identifiable segment of DNA within a larger DNA molecule that is 
not found in association with the larger molecule in nature. Thus, 
when the heterologous region encodes a mammalian gene, the 
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gene will usually be flanked by DNA that does not flank the 
mammalian genomic DNA in the genome of the source organism. 
In another example, coding sequence is a construct where the 
coding sequence itself is not found in nature (e.g., a cDNA where 
5 the genomic coding sequence contains introns, or synthetic 
sequences having codons different than the native gene). Allelic 
variations or naturally-occurring mutational events do not give 
rise to a heterologous region of DNA as defined herein. 

The labels most commonly employed for these studies 

10 are radioactive elements, enzymes, chemicals which fluoresce 
when exposed to ultraviolet light, and others. A number of 
fluorescent materials are known and can be utilized as labels. 
These include, for example, fluorescein, rhodamine, auramine, 
Texas Red, AMCA blue and Lucifer Yellow. A particular detecting 

15 material is anti-rabbit antibody prepared in goats and conjugated 
with fluorescein through an isothiocyanate. 

Proteins can also be labeled with a radioactive element 
or with an enzyme. The radioactive label can be detected by any 
of the currently available counting procedures. The preferred 

20 isotope may be selected from ^H, i^c, 32p^ 35s, 36ci, 5iCr, 57Co, 58Co, 
59Fe, 90Y, 1251, 1311, and is^Re. 

Enzyme labels are likewise useful, and can be detected 
by any of the presently utilized colorimetric, spectrophotometric, 
fluorospectrophotometric, amperometric or gasometric techniques. 

25 The enzyme is conjugated to the selected particle by reaction with 
bridging molecules such as carbodiimides, diisocyanates, 
glutaraldehyde and the like. Many enzymes which can be used in 
these procedures are known and can be utilized. The preferred 
are peroxidase, p-glucuronidase, P-D-glucosidase, P-D- 
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galactosidase, urease, glucose oxidase plus peroxidase and alkaline 
phosphatase. U.S. Patent Nos. 3,654,090, 3,850,752, and 4,016,043 
are referred to by way of example for their disclosure of alternate 
labeling material and methods. 
5 A particular assay system developed and utilized in 

the art is known as a receptor assay. In a receptor assay, the 
material to be assayed is appropriately labeled and then certain 
cellular test colonies are inoculated with a quantitiy of both the 
label after which binding studies are conducted to determine the 
1 0 extent to which the labeled material binds to the cell receptors. I n 
this way, differences in affinity between materials can be 
ascertained. 

An assay useful in the art is known as a "cis/trans" 
assay. Briefly, this assay employs two genetic constructs, one of 

15 which is typically a plasmid that continually expresses a particular 
receptor of interest when transfected into an appropriate cell line, 
and the second of which is a plasmid that expresses a reporter 
such as luciferase, under the control of a receptor/ligand complex. 
Thus, for example, if it is desired to evaluate a compound as a 

20 ligand for a particular receptor, one of the plasmids would be a 
construct that results in expression of the receptor in the chosen 
cell line, while the second plasmid would possess a promoter 
linked to the luciferase gene in which the response element to the 
particular receptor is inserted. If the compound under test is an 

25 agonist for the receptor, the ligand will complex with the receptor, 
and the resulting complex will bind the response element and 
initiate transcription of the luciferase gene. The resulting 
chemiluminescence is then measured photometrically, and dose 
response curves are obtained and compared to those of known 
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ligands. The foregoing protocol is described in detail in U.S. Patent 
No. 4,981,784. 

As used herein, the term "host" is meant to include not 
only prokaryotes but also eukaryotes such as yeast, plant and 
5 animal cells. A recombinant DNA molecule or gene which encodes 
a human TADG-12 protein of the present invention can be used to 
transform a host using any of the techniques commonly known to 
those of ordinary skill in the art. Especially preferred is the use of 
a vector containing coding sequences for the gene which encodes a 

10 huma TADG-12 protein of the present invention for purposes of 
prokaryote transformation. Prokaryotic hosts may include E. coli, 
5. tymphimurium, Serratia marcescens and Bacillus subtilis, 
Eukaryotic hosts include yeasts such as Pichia pastoris, 
mammalian cells and insect cells. 

15 In general, expression vectors containing promoter 

sequences which facilitate the efficient transcription of the 
inserted DNA fragment are used in connection with the host. The 
expression vector typically contains an origin of replication, 
promoter(s), terminator(s), as well as specific genes which are 

20 capable of providing phenotypic selection in transformed cells. 
The transformed hosts can be fermented and cultured according to 
means known in the art to achieve optimal cell growth. 

The invention includes a substantially pure DNA 
encoding a TADG-12 protein, a strand of which DNA will hybridize 

25 at high stringency to a probe containing a sequence of at least 1 5 
consecutive nucleotides of the sequence shown in SEQ ID No. 1 or 
SEQ ID No. 3. The protein encoded by the DNA of this invention 
may share at least 80% sequence identity (preferably 85%, more 
preferably 90%, and most preferably 95%) with the amino acids 
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listed in SEQ ID No. 2 or SEQ ID No. 4. More preferably, the DNA 
includes the coding sequence of the nucleotides of Figure 4 (SEQ ID 
No. 1), or a degenerate variant of such a sequence. 

The probe to which the DNA of the invention 
5 hybridizes preferably consists of a sequence of at least 2 0 
consecutive nucleotides, more preferably 40 nucleotides, even 
more preferably 50 nucleotides, and most preferably 100 
nucleotides or more (up to 100%) of the coding sequence of the 
nucleotides listed in Figure 4 (SEQ ID No. 1) or the complement 
10 thereof. Such a probe is useful for detecting expression of TADG- 
12 in a human cell by a method including the steps of (a) 
contacting mRNA obtained from the cell with the labeled 
hybridization probe; and (b) detecting hybridization of the probe 
with the mRNA. 

15 This invention also includes a substantially pure DNA 

containing a sequence of at least 15 consecutive nucleotides 
(preferably 20, more preferably 30, even more preferably 50, and 
most preferably all) of the region from nucleotides 1 to 2413 of 
the nucleotides listed in SEQ ID No. 1, or of the region from 

20 nucleotides 1 to 2544 of the nucleotides listed in SEQ ID No. 3. The 
present invention also comprises antisense oligonucleotides 
directed against this novel DNA. Given the teachings of the 
present invention, a person having ordinary skill in this art would 
readily be able to develop antisense oligonucleotides directed 

25 against this DNA. 

By "high stringency" is meant DNA hybridization and 
wash conditions characterized by high temperature and low salt 
concentration, e.g., wash conditions of 65*^C at a salt concentration 
of approximately 0.1 x SSC, or the functional equivalent thereof. 
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For example, high stringency conditions may include hybridization 
at about 42*'C in the presence of about 50% formamide; a first 
wash at about 65°C with about 2 x SSC containing 1% SDS; followed 
by a second wash at about 65^C with about 0.1 x SSC, 
5 By "substantially pure DNA" is meant DNA that is not 

part of a milieu in which the DNA naturally occurs, by virtue of 
separation (partial or total purification) of some or all of the 
molecules of that milieu, or by virtue of alteration of sequences 
that flank the claimed DNA. The term therefore includes, for 

10 example, a recombinant DNA which is incorporated into a vector, 
into an autonomously replicating plasmid or virus, or into the 
genomic DNA of a prokaryote or eukaryote; or which exists as a 
separate molecule (e.g., a cDNA or a genomic or cDNA fragment 
produced by polymerase chain reaction (PGR) or restriction 

15 endonuclease digestion) independent of other sequences. It also 
includes a recombinant DNA which is part of a hybrid gene 
encoding additional polypeptide sequence, e.g., a fusion protein. 
Also included is a recombinant DNA which includes a portion of 
the nucleotides shown in SEQ ID No. 3 which encodes an 

20 alternative splice variant of TADG-12 (TADG-12V). 

The DNA may have at least about 70% sequence 
identity to the coding sequence of the nucleotides listed in SEQ ID 
No. 1 or SEQ ID No. 3, preferably at least 75% (e.g. at least 80%); 
and most preferably at least 90%. The identity between two 

25 sequences is a direct function of the number of matching or 
identical positions. When a subunit position in both of the two 
sequences is occupied by the same monomeric subunit, e.g., if a 
given position is occupied by an adenine in each of two DNA 
molecules, then they are identical at that position. For example, if 
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7 positions in a sequence 10 nucleotides in length are identical to 
the corresponding positions in a second 10-nucleotide sequence, 
then the two sequences have 70% sequence identity. The length 
of comparison sequences will generally be at least 50 nucleotides, 
5 preferably at least 60 nucleotides, more preferably at least 7 5 
nucleotides, and most preferably 100 nucleotides. Sequence 
identity is typically measured using sequence analysis software 
(e.g., Sequence Analysis Software Package of the Genetics 
Computer Group, University of Wisconsin Biotechnology Center, 
10 1710 University Avenue, Madison, WI 53705). 

The present invention comprises a vector comprising a 
DNA sequence which encodes a human TADG-12 protein and the 

f 

vector is capable of replication in a host which comprises, in 
operable linkage: a) an origin of replication; b) a promoter; and c) 

15 a DNA sequence coding for said protein. Preferably, the vector of 
the present invention contains a portion of the DNA sequence 
shown in SEQ ID No. 1 or SEQ ID No. 3. A "vector" may be defined 
as a replicable nucleic acid construct, e.g., a plasmid or viral 
nucleic acid. Vectors may be used to amplify and/or express 

20 nucleic acid encoding a TADG-12 protein. An expression vector is 
a replicable construct in which a nucleic acid sequence encoding a 
polypeptide is operably linked to suitable control sequences 
capable of effecting expression of the polypeptide in a cell. The 
need for such control sequences will vary depending upon the cell 

25 selected and the transformation method chosen. Generally, control 
sequences include a transcriptional promoter and/or enhancer, 
suitable mRNA ribosomal binding sites, and sequences which 
control the termination of transcription and translation. Methods 
which are well known to those skilled in the art can be used to 
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construct expression vectors containing appropriate 
transcriptional and translational control signals. See for example, 
the techniques described in Sambrook et al., 1989, Molecular 
Cloning: A Laboratory Manual (2nd Ed.), Cold Spring Harbor Press, 
5 N.Y. A gene and its transcription control sequences are defined as 
being "operably linked" if the transcription control sequences 
effectively control the transcription of the gene. Vectors of the 
invention include, but are not limited to, plasmid vectors and viral 
vectors. Preferred viral vectors of the invention are those derived 

10 from retroviruses, adenovirus, adeno-associated virus, SV40 virus, 
or herpes viruses. 

By a "substantially pure protein" is meant a protein 
which has been separated from at least some of those components 
which naturally accompany it. Typically, the protein is 

15 substantially pure when it is at least 60%, by weight, free from the 
proteins and other naturally-occurring organic molecules with 
which it is naturally associated in vivo. Preferably, the purity of 
the preparation is at least 75%, more preferably at least 90%, and 
most preferably at least 99%, by weight. A substantially pure 

20 TADG-12 protein may be obtained, for example, by extraction 
from a natural source; by expression of a recombinant nucleic acid 
encoding an TADG-12 polypeptide; or by chemically synthesizing 
the protein. Purity can be measured by any appropriate method, 
e.g., column chromatography such as immunoaffinity 

25 chromatography using an antibody specific for TADG-12, 
polyacrylamide gel electrophoresis, or HPLC analysis. A protein is 
substantially free of naturally associated components when it is 
separated from at least some of those contaminants which 
accompany it in its natural state. Thus, a protein which is 
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chemically synthesized or produced in a cellular system different 
from the cell from which it naturally originates will be, by 
definition, substantially free from its naturally associated 
components. Accordingly, substantially pure proteins include 
5 eukaryotic proteins synthesized in E. coli, other prokaryotes, or 
any other organism in which they do not naturally occur. 

In addition to substantially full-length proteins, the 
invention also includes fragments (e.g., antigenic fragments) of the 
TADG-12 protein. As used herein, "fragment," as applied to a 

10 polypeptide, will ordinarily be at least 10 residues, more typically 
at least 20 residues, and preferably at least 30 (e.g., 50) residues 
in length, but less than the entire, intact sequence. Fragments of 
the TADG-12 protein can be generated by methods known to those 
skilled in the art, e.g., by enzymatic digestion of naturally 

15 occurring or recombinant TADG-12 protein, by recombinant DNA 
techniques using an expression vector that encodes a defined 
fragment of TADG-12, or by chemical synthesis. The ability of a 
candidate fragment to exhibit a characteristic of TADG-12 (e.g., 
binding to an antibody specific for TADG-12) can be assessed by 

20 methods described herein. Purified TADG-12 or antigenic 
fragments of TADG-12 can be used to generate new antibodies or 
to test existing antibodies (e.g., as positive controls in a diagnostic 
assay) by employing standard protocols known to those skilled in 
the art. Included in this invention are polyclonal antisera 

25 generated by using TADG-12 or a fragment of TADG-12 as the 
immunogen in, e.g., rabbits. Standard protocols for monoclonal 
and polyclonal antibody production known to those skilled in this 
art are employed. The monoclonal antibodies generated by this 
procedure can be screened for the ability to identify recombinant 
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TADG-12 cDNA clones, and to distinguish them from known cDNA 
clones. 

Further included in this invention are TADG-12 
proteins which are encoded at least in part by portions of SEQ ID 
5 No. 1 or SEQ ID No. 3, e.g., products of alternative mRNA splicing or 
alternative protein processing events, or in which a section of 
TADG-12 sequence has been deleted. The fragment, or the intact 
TADG-12 polypeptide, may be covalently linked to another 
polypeptide, e.g. which acts as a label, a ligand or a means to 
10 increase antigenicity. 

The invention also includes a polyclonal or monoclonal 
antibody which specifically binds to TADG-12. The invention 

4 

encompasses not only an intact monoclonal antibody, but also an 
immunologically-active antibody fragment, e.g., a Fab or (Fab)2 

15 fragment; an engineered single chain Fv molecule; or a chimeric 
molecule, e.g., an antibody which contains the binding specificity 
of one antibody, e.g., of murine origin, and the remaining portions 
of another antibody, e.g., of human origin. 

In one embodiment, the antibody, or a fragment 

20 thereof, may be linked to a toxin or to a detectable label, e.g. a 
radioactive label, non-radioactive isotopic label, fluorescent label, 
chemiluminescent label, paramagnetic label, enzyme label, or 
colorimetric label. Examples of suitable toxins include diphtheria 
toxin, Pseudomonas exotoxin A, ricin, and cholera toxin. Examples 

25 of suitable enzyme labels include malate hydrogenase, 
staphylococcal nuclease, delta-5-steroid isomerase, alcohol 
dehydrogenase, alpha-glycerol phosphate dehydrogenase, triose 
phosphate isomerase, peroxidase, alkaline phosphatase, 
asparaginase, glucose oxidase, beta-galactosidase, ribonuclease. 
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urease, catalase, glucose-6-phosphate dehydrogenase, 
glucoamylase, acetylcholinesterase, etc. Examples of suitable 
radioisotopic labels include ^h, 125i^ ISlj^ 32p 35s, 14c, etc. 

Paramagnetic isotopes for purposes of in vivo 
5 diagnosis can also be used according to the methods of this 
invention. There are numerous examples of elements that are 
useful in magnetic resonance imaging. For discussions on in vivo 
nuclear magnetic resonance imaging, see, for example, Schaefer et 
al., (1989) JACC 14, 472-480; Shreve et al., (1986) Magn. Reson, 

10 Med, 3, 336-340; Wolf, G. L., (1984) Physiol Chem, Phys. Med. 
NMRie, 93-95; Wesbey et al., (1984) PhysioL Chem. Phys, Med, 
NMR 16, 145-155; Runge et al., (1984) Invest. Radiol 19, 408-415. 
Examples of suitable fluorescent labels include a fluorescein label, 
an isothiocyalate label, a rhodamine label, a phycoerythrin label, a 

15 phycocyanin label, an allophycocyanin label, an ophthaldehyde 
label, a fluorescamine label, etc. Examples of chemiluminescent 
labels include a luminal label, an isoluminal label, an aromatic 
acridinium ester label, an imidazole label, an acridinium salt label, 
an oxalate ester label, a luciferin label, a luciferase label, an 

20 aequorin label, etc. 

Those of ordinary skill in the art will know of other 
suitable labels which may be employed in accordance with the 
present invention. The binding of these labels to antibodies or 
fragments thereof can be accomplished using standard techniques 

25 commonly known to those of ordinary skill in the art. Typical 
techniques are described by Kennedy et al., (1976) Clin. Chim. 
Acta 70, 1-31; and Schurs et al., (1977) Clin. Chim. Acta 81, 1-40. 
Coupling techniques mentioned in the latter are the 
glutaraldehyde method, the periodate method, the dimaleimide 
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method, the m-maleimidobenzyl-N-hydroxy-succinimide ester 
method. All of these methods are incorporated by reference 
herein. 

Also within the invention is a method of detecting 
5 TADG-12 protein in a biological sample, which includes the steps 
of contacting the sample with the labeled antibody, e.g., 
radioactively tagged antibody specific for TADG-12, and 
determining whether the antibody binds to a component of the 
sample. 

10 As described herein, the invention provides a number 

of diagnostic advantages and uses. For example, the TADG-12 
protein disclosed in the present invention is useful in diagnosing 

4 

cancer in different tissues since this protein is highly 
overexpressed in tumor cells. Antibodies (or antigen-binding 

15 fragments thereof) which bind to an epitope specific for TADG-12, 
are useful in a method of detecting TADG-12 protein in a biological 
sample for diagnosis of cancerous or neoplastic transformation. 
This method includes the steps of obtaining a biological sample 
(e.g., cells, blood, plasma, tissue, etc.) from a patient suspected of 

20 having cancer, contacting the sample with a labeled antibody (e.g., 
radioactively tagged antibody) specific for TADG-12, and detecting 
the TADG-12 protein using standard immunoassay techniques 
such as an ELISA. Antibody binding to the biological sample 
indicates that the sample contains a component which specifically 

25 binds to an epitope within TADG-12. 

Likewise, a standard Northern blot assay can be used 
to ascertain the relative amounts of TADG-12 mRNA in a cell or 
tissue obtained from a patient suspected of having cancer, in 
accordance with conventional Northern hybridization techniques 
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known to those of ordinary skill in the art. This Northern assay 
uses a hybridization probe, e.g. radiolabelled TADG-12 cDNA, 
either containing the full-length, single stranded DNA having a 
sequence complementary to SEQ ID No. 1 or SEQ ID No. 3, or a 
5 fragment of that DNA sequence at least 20 (preferably at least 30, 
more preferably at least 50, and most preferably at least 100 
consecutive nucleotides in length). The DNA hybridization probe 
can be labeled by any of the many different methods known to 
those skilled in this art. 
10 Antibodies to the TADG-12 protein can be used in an 

immunoassay to detect increased levels of TADG-12 protein 
expression in tissues suspected of neoplastic transformation. 
These same uses can be achieved with Northern blot assays and 
analyses. 

15 The present invention is directed to DNA fragment 

encoding a TADG-12 protein selected from the group consisting of: 

(a) an isolated DNA fragment which encodes a TADG-12 protein; 

(b) an isolated DNA fragment which hybridizes to isolated DNA 
fragment of (a) above and which encodes a TADG-12 protein; and 

20 (c) an isolated DNA fragment differing from the isolated DNA 
fragments of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG-12 
protein. Preferably, the DNA has the sequence shown in SEQ ID 
No. 1 or SEQ ID No. 3. More preferably, the DNA encodes a TADG- 

25 12 protein having the amino acid sequence shown in SEQ ID No. 2 
or SEQ ID No. 4. 

The present invention is also directed to a vector 
and/or a host cell capable of expressing the DNA of the present 
invention. Preferably, the vector contains DNA encoding a TADG- 
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12 protein having the amino acid sequence shown in SEQ ID No, 2 
or SEQ ID No. 4. Representative host cells include bacterial cells, 
yeast cells, mammalian cells and insect cells. 

The present invention is also directed to an isolated 
5 and purified TADG-12 protein coded for by DNA selected from the 
group consisting of: (a) isolated DNA which encodes a TADG-12 
protein; (b) isolated DNA which hybridizes to isolated DNA of (a) 
above and which encodes a TADG-12 protein; and (c) isolated DNA 
differing from the isolated DNAs of (a) and (b) above in codon 

10 sequence due to the degeneracy of the genetic code, and which 
encodes a TADG-12 protein. Preferably, the isolated and purified 
TADG-12 protein has the amino acid sequence shown in SEQ ID No. 
2 or SEQ ID No. 4. 

The present invention is also directed to a method of 

15 detecting expression of the TADG-12 protein described herein, 
comprising the steps of: (a) contacting mRNA obtained from the 
cell with the labeled hybridization probe; and (b) detecting 
hybridization of the probe with the mRNA. 

A number of potential applications are possible for the 

20 TADG-12 gene and gene product including the truncated product 
TADG-12V. 

In one embodiment of the present invention, there is 
provided a method for diagnosing a cancer by detecting a TADG- 
12 protein in a biological sample, wherein the presence or absence 
25 of a TADG-12 protein indicates the presence or absence of a 
cancer. Preferably, the biological sample is selected from the 
group consisting of blood, urine, saliva, tears, interstitial fluid, 
ascites fluid, tumor tissue biopsy and circulating tumor cells. Still 
preferably, the detection of TADG-12 protein is by means selected 
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from the group consisting of Northern blot, Western blot, PCR, dot 
blot, ELIZA sandwich assay, radioimmunoassay, DNA array chips 
and flow cytometry. Such method is used for detecting an ovarian 
cancer, breast cancer, lung cancer, colon cancer, prostate cancer 
5 and other cancers in which TADG-12 is overexpressed. 

In another embodiment of the present invention, there 
is provided a method for detecting malignant hyperplasia by 
detecting a TADG-12 protein or TADG-12 mRNA in a biological 
sample. Further by comprising the TADG-12 protein or TADG-12 

10 mRNA to reference information, a diagnosis or a treatment can be 
provided. Preferably, PGR amplification is used for detecting 
TADG-12 mRNA, wherein the primers utilized are selected from 
the group consisting of SEQ ID Nos. 28-31. Still preferably, 
detection of a TADG-12 protein is by immunoaffinity to an 

15 antibody directed against a TADG-12 protein. 

In still another embodiment of the present invention, 
there is provided a method of inhibiting expression of endogenous 
TADG-12 mRNA in a cell by introducing a vector comprising a DNA 
fragment of TADG-12 in opposite orientation operably linked to 

20 elements necessary for expression. As a result, the vector 
produces TADG-12 antisense mRNA in the cell, which hybridizes to 
endogenous TADG-12 mRNA, thereby inhibiting expression of 
endogenous TADG-12 mRNA. 

In still yet another embodiment of the present 

25 invention, there is provided a method of inhibiting expression of a 
TADG-12 protein by introducing an antibody directed against a 
TADG-12 protein or fragment thereof. As a result, the binding of 
the antibody to the TADG-12 protein or fragment thereof inhibits 
the expression of the TADG-12 protein. 
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TADG-12 gene products including the truncated form 
can be used for targeted therapy. Specifically, a compound having 
a targeting moiety specific for a TADG-12 protein and a 
therapeutic moiety is administered to an individual in need of 
5 such treatment. Preferably, the targeting moiety is selected from 
the group consisting of an antibody directed against a TADG-12 
protein and a ligand or ligand binding domain that binds a TADG- 
12 protein. The TADG-12 protein has an amino acid sequence 
shown in SEQ ID No. 2 or SEQ ID No. 4. Still preferably, the 

10 therapeutic moiety is selected from the group consisting of a 
radioisotope, a toxin, a chemotherapeutic agent, an immune 
stimulant and a cytotoxic agent. Such method can be used for 
treating an individual having a disease selected from the group 
consisting of ovarian cancer, lung cancer, prostate cancer, colon 

15 cancer and other cancers in which TADG-12 is overexpressed. 

In yet another embodiment of the present invention, 
there is provided a method of vaccinating, or producing an 
immune response in, an individual against TADG-12 by inoculating 
the individual with a TADG-12 protein or fragment thereof. 

20 Specifically, the TADG-12 protein or fragment thereof lacks TADG- 
12 activity, and the inoculation elicits an immune response in the 
individual, thereby vaccinating the individual against TADG-12. 
Preferably, the individual has a cancer, is suspected of having a 
cancer or is at risk of getting a cancer. Still preferably, TADG-12 

25 protein has an amino acid sequence shown in SEQ ID No. 2 or SEQ 
ID No. 4, while TADG-12 fragment has a sequence shown in SEQ ID 
No. 8, or is a 9-residue fragment up to a 20-residue fragment. 
Examples of 9-residue fragment are shown in SEQ ID Nos. 35, 36, 
55, 56, 83, 84, 97, 98, 119, 120, 122, 123 and 136. 
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In Still yet another embodiment of the present 
invention, there is provided an immunogenic composition, 
comprising an immunogenic fragment of a TADG-12 protein and 
an appropriate adjuvant. Preferably, the immunogenic fragment 
5 of the TADG-12 protein has a sequence shown in SEQ ID No. 8, or is 
a 9-residue fragment up to a 20-residue fragment. Examples of 9- 
residue fragment are shown in SEQ ID Nos. 35, 36, 55, 56, 83, 84, 
97, 98, 119, 120, 122, 123 and 136. 

The following examples are given for the purpose of 
10 illustrating various embodiments of the invention and are not 
meant to limit the present invention in any fashion. 



EXAMPLE 1 

Tissue collection and storage 

15 Upon patient hysterectomy, bilateral salpingo- 

oophorectomy, or surgical removal of neoplastic tissue, the 
specimen is retrieved and placed on ice. The specimen was then 
taken to the resident pathologist for isolation and identification of 
specific tissue samples. Finally, the sample was frozen in liquid 

20 nitrogen, logged into the laboratory record and stored at -80°C. 
Additional specimens were frequently obtained from the 
Cooperative Human Tissue Network (CHTN). These samples were 
prepared by the CHTN and shipped on dry ice. Upon arrival, these 
specimens were logged into the laboratory record and stored at - 

25 80°C. 

EXAMPLE 2 

mRNA Extraction and cDNA Svnthesis 

Sixty-nine ovarian tumors (4 benign tumors, 10 low 
malignant potential tumors and 55 carcinomas) and 10 normal 
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ovaries were obtained from surgical specimens and frozen in 
liquid nitrogen. The human ovarian carcinoma cell lines SW 626 
and Caov 3, the human breast carcinoma cell lines MDA-MB-231 
and MDA-MB-435S were purchased from the American Type 
5 Culture Collection (Rockville, MD). Cells were cultured to sub- 
confluency in Dulbecco's modified Eagle's medium, supplemented 
with 10% (v/v) fetal bovine serum and antibiotics. 

Extraction of mRNA and cDNA synthesis were carried 
out by the methods described previously [14-16]. mRNA was 
10 isolated by using a RiboSep mRNA isolation kit (Becton Dickinson 
Labware). In this procedure, poly A+ mRNA was isolated directly 
from the tissue lysate using the affinity chromatography media 
oligo(dT) cellulose. cDNA was synthesized with 5.0 \Lg of mRNA by 

random hexamer priming using 1st strand cDNA synthesis kit 
1 5 (CLONTECH). 

EXAMPLE 3 

PGR with Redundant Primers and Cloning of TADG-12 cDNA 

Redundant primers, forward 5'- 

20 TGGGTIGTIACIGCIGCICA(CT)TG -3' (SEQ ID No. 26) and reverse 5'- 
A(AG)IA(AG)IGCIATITCITTICC-3' (SEQ ID No. 27), for the 
consensus sequences of amino acids surrounding the catalytic 
triad for serine proteases were used to compare the PGR products 
from normal and carcinoma cDNAs. The appropriate bands were 

25 ligated into Promega T-vector plasmid and the ligation product 
was used to transform JM109 cells (Promega) grown on selection 
media. After selection of individual colonies, they were cultured 
and plasmid DNA was isolated by means of the Wizard miniprep 
DNA purification system (Promega). Nucleotide sequencing was 

33 
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performed using PRISM Ready Reaction Dye Deoxy terminator 
cycle sequencing kit (Applied Biosystems). Applied Biosystems 
Model 373A DNA sequencing system was used for direct cDNA 
sequence determination. 
5 The original TADG-12 subclone was randomly labeled 

and used as a probe to screen an ovarian tumor cDNA library b y 
standard hybridization techniques [11,15]. The library was 
constructed in XZAP using mRNA isolated from the tumor cells of a 

stage Ill/grade III ovarian adenocarcinoma patient. Three 
10 overlapping clones were obtained which spanned 2315 
nucleotides. The final 99 nucleotides encoding the most 3' 
sequence including the poly A tail was identified by, homology 
with clones available in the GenBank EST database. 



15 EXAMPLE 4 

Quantitative PGR 

The mRNA overexpression of TADG-12 was 
determined using a quantitative PGR. Quantitative PGR was 
performed according to the procedure as previously reported [16]. 

20 Oligonucleotide primers were used for: TADG-12, forward 5'- 
GAAAGATGTGCTTGCrCTGG-3' (SEQ ID No. 28) and reverse 5'- 
AGTAAGTTGGAGAGGGTGGT-3' (SEQ ID No. 29); the variant TADG-12, 
forward 5'-TGGAGGTGGGTCTAGTTTGG-3' (SEQ ID No. 30), reverse 
5'-GTGTTTGGGTTGTACTTGCr-3' (SEQ ID No. 31); p -tubulin, forward 

25 5'- GGGATCAAGGTGTACTAGAA -3' (SEQ ID No. 32) and reverse 5'- 
TAGGAGCTGGTGGACTGAGA -3' (SEQ ID No. 33). p -tubulin was 
utilized as an internal control. The PGR reaction mixture consists 
of cDNA derived from 50 ng of mRNA, 5 pmol of sense and 
antisense primers for both the TADG-12 gene and the P-tubulin 

34 
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gene, 200 \imo\ of dNTPs, 5 ^iCi of a-'^PdCTP and 0.25 unit of Taq 
DNA polymerase with reaction buffer (Promega) in a final volume 
of 25 ^il. The target sequences were amplified in parallel with the 
P-tubulin gene. Thirty cycles of PCR were carried out in a Thermal 
5 Cycler (Perkin-Elmer Cetus). Each cycle of PCR included 3 0 
seconds of denaturation at 94%C, 30 seconds of annealing at 60%C 
and 30 seconds of extension at 72%C. The PCR products were 
separated on 2% agarose gels and the radioactivity of each PCR 
product was determined by using a Phospho Imager (Molecular 

10 Dynamics). The present study used the expression ratio (TADG- 
12/p-tubulin) as measured by phosphoimager to evaluate gene 
expression and defined the value at mean + 2SD of normal ovary 
as the cut-off value to determine overexpression. The student's t 
test was used for comparison of the mean values of normal ovary 

1 5 and tumors. 



EXAMPLE 5 

Sequencing of TADG- 1 2/TADG-12V 

Utilizing a plasmid specific primer near the cloning 

20 site, sequencing reactions were carried out using PRISM^'^ Ready 
Reaction Dye Deoxy'^'^ terminators (Applied Biosy stems cat# 
401384) according to the manufacturer's instructions. Residual 
dye terminators were removed from the completed sequencing 
reaction using a Centri-sep'^'^ spin column (Princeton Separation 

25 cat.# CS-901). An Applied Biosystems Model 373A DNA 
Sequencing System was available and was used for sequence 
analysis. 
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EXAMPLE 6 

Antibody Production 

Polyclonal rabbit antibodies were generated b y 
immunization of white New Zealand rabbits with a poly-lysine 
5 linked multiple antigen peptide derived from the TADG-12 
carboxy-terminal protein sequence NH -WIHEQMERDLKT-COOH 

(WIHEQMERDLKT, SEQ ID No. 34). This peptide is present in full 
length TADG-12, but not TADG-12V. Rabbits were immunized 
with approximately 100 ^ig of peptide emulsified in Ribi adjuvant. 
1 0 Subsequent boost immunizations were carried out at 3 and 6 
weeks, and rabbit serum was isolated 10 days after the boost 
inoculations. Sera were tested by dot blot analysis to. determine 

* 

affinity for the TADG-12 specific peptide. Rabbit pre-immune 
serum was used as a negative control. 

15 

EXAMPLE 7 

Northern Blot Analysis 

10 ng of mRNA were loaded onto a 1% formaldehyde- 
agarose gel, electrophoresed and blotted on a Hybond-N+ nylon 
20 membrane (Amersham). "P-labeled cDN A probes were made by 
Prime-a-Gene Labeling System (Promega). The PGR products 
amplified by the same primers as above were used for probes. 
The blots were prehybridized for 30 min and hybridized for 6 0 

min at 68%C with "P-labeled cDNA probe in ExpressHyb 
25 Hybridization Solution (CLONTECH). Control hybridization to 
determine relative gel loading was performed with the P-tubulin 
probe. 
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Normal human tissues; spleen, thymus, prostate, testis, 
ovary, small intestine, colon and peripheral blood leukocyte, and 
normal human fetal tissues; brain, lung, liver and kidney (Human 
Multiple Tissue Northern Blot; CLONTECH) were also examined b y 
5 same hybridization procedure. 

EXAMPLE 8 

Immunohistochemistry 

Immunohistochemical staining was performed using a 

10 Vectastain Elite ABC Kit (Vector). Formalin fixed and paraffin 
embedded specimens were routinely deparaffinized and processed 
using microwave heat treatment in 0.01 M sodium citrate buffer 
(pH 6.0). The specimens were incubated with normal goat serum 
in a moist chamber for 30 minutes. TADG-12 peptide antibody 

15 was allowed to incubate with the specimens in a moisture 
chamber for 1 hour. Excess antibody was washed away with 
phosphate buffered saline. After incubation with biotinylated 
anti-rabbit IgG for 30 minutes, the sections were then incubated 
with ABC reagent (Vector) for 30 minutes. The final products 

20 were visualized using the AEC substrate system (DAKO) and 
sections were counterstained with hematoxylin before mounting. 
Negative controls were performed by using normal serum instead 
of the primary antibody. 

25 EXAMPLE 9 

Isolation of Catalvtic Domain Subclones of TADG-12 and TADG-12 
Variant 

To identify serine proteases that are expressed in 
ovarian tumors, redundant PCR primers designed to the conserved 
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regions of the catalytic triad of these enzymes were employed. A 
sense primer designed to the region surrounding the conserved 
histidine and an anti-sense primer designed to the region 
surrounding the conserved aspartate were used in PCR reactions 
5 with either normal ovary or ovarian tumor cDNA as template. I n 
the reaction with ovarian tumor cDNA, a strong product band of 
the expected size of approximately 180 bp was observed as well 
as an unexpected PCR product of approximately 300 bp which 
showed strong expression in some ovarian tumor cDNA's (Figure 

10 lA). Both of these PCR products were subcloned and sequenced. 
The sequence of the subclones from the 180bp band (SEQ ID No. 5) 
was found to be homologous to the sequence identified in the 
larger, unexpected band (SEQ ID No. 7) except that the larger band 
had an additional insert of 133 nucleotides (Figure IB). The 

15 smaller product of the appropriate size encoded for a protein 
sequence (SEQ ID No. 6) homologous to other known proteases 
while the sequence with the insertion (SEQ ID No. 8) encoded for a 
frame shift from the serine protease catalytic domain and a 
subsequent premature translational stop codon. TADG-12 variants 

20 from four individual tumors were also subcloned and sequenced. 
It was found that the sequence and insert to be identical. The 
genomic sequences for these cDNA derived clones were amplified 
by PCR, examined and found to contain potential AG/GT splice 
sites that would allow for the variant transcript production. 

25 

EXAMPLE 10 

Northern Blot Analvsis of TADG-12 Expression 

To examine transcript size and tissue distribution, the 
catalytic domain subclone was randomly labeled and used to 
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probe Northern blots representing normal ovarian tissue, ovarian 
tumors and the cancer cell lines SW626, CAOV3, HeLa, MD-MBA- 
435S and MD-MBA-231 (Figure 2). Three transcripts of 2.4, 1.6 
and 0.7 kilobases were observed. In blots of normal and ovary 
5 tumor the smallest transcript size 0.7 kb was lowly expressed in 
normal ovary while all transcripts (2.4, 1.6 and 0.7 kb) were 
abundantly present in serous carcinoma. In addition. Northern 
blots representing the normal human tissues spleen, thymus, 
prostate, testis, ovary, small intestine, colon and peripheral blood 

10 leukocyte, and normal human fetal tissues of brain, lung, liver and 
kidney were examined. The same three transcripts were found to 
be expressed weakly in all of these tissues (data not shown). A 
human p -tubulin specific probe was utilized as a control for 
relative sample loading. In addition, an RNA dot blot was probed 

15 representing 50 human tissues and determined that this clone is 
weakly expressed in all tissues represented (Figure 3). It was 
found most prominently in heart, with intermediate levels in 
putamen, amygdala, kidney, liver, small intestine, skeletal muscle, 
and adrenal gland. 

20 

EXAMPLE 11 

Sequencing and Characterization of TADG-12 

An ovarian tumor cDNA library constructed in X,ZAP 

was screened by standard hybridization techniques using the 
25 catalytic domain subclone as a probe. Two clones that overlapped 
with the probe were identified and sequenced and found to 
represent 2316 nucleotides. The 97 nucleotides at the 3* end of 
the transcript including the poly-adenylation signal and the poly 
(A) tail were identified by homology with clones available in 
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GenBank's EST database. This brought the total size of the 
transcript to 2413 bases (SEQ ID No. 1, Figure 4). Subsequent 
screening of GenBank*s Genomic Database revealed that TADG-12 
is homologous to a cosmid from chromosome 17. This cosmid has 
5 the accession number AC015555. 

The identified cDNA includes an open reading frame 
that would produce a predicted protein of 454 amino acids (SEQ ID 
No. 2), named Tumor Associated Differentially-Expressed Gene 1 2 
(TADG-12). The sequence has been submitted to the GenBank 

10 database and granted the accession # AF201380. Using homology 
alignment programs, this protein contains several domains 
including an amino-terminal cytoplasmic domain, a potential Type 
II transmembrane domain followed by a low-density lipoprotein 
receptor-like class A domain (LDLR-A), a scavenger receptor 

15 cysteine rich domain (SRCR), and an extracellular serine protease 
domain. 

As predicted by the '^^Pred program, TADG-12 contains 
a highly hydrophobic stretch of amino acids that could serve as a 
potential transmembrane domain, which v^^ould retain the amino 

20 terminus of the protein within the cytoplasm and expose the 
ligand binding domains and protease domain to the extracellular 
space. This general structure is consistent with other known 
transmembrane proteases including hepsin [17], and TMPRSS2 
[18], and TADG-12 is particularly similar in structure to the 

25 TMPRSS2 protease. 

The LDLR-A domain of TADG-12 is represented by the 
sequence from amino acid 74 to 108 (SEQ ID No. 13). The LDLR-A 
domain was originally identified within the LDL Receptor [19] as a 
series of repeated sequences of approximately 40 amino acids, 
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which contained 6 invariant cysteine residues and highly 
conserved aspartate and glutamate residues. Since that initial 
identification, a host of other genes have been identified which 
contain motifs homologous to this domain [20]. Several proteases 
5 have been identified which contain LDLR-A motifs including 
matriptase, TMPRSS2 and several complement components. A 
comparison of TADG-12 with other known LDLR-A domains is 
shown in Figure 5A. The similarity of these sequences range from 
44 to 54% of similar or identical amino acids. 

10 In addition to the LDLR-A domain, TADG-12 contains 

another extracellular ligand binding domain with homology to the 
group A SROl family. This family of protein domains typically is 
defined by the conservation of 6 cysteine resides within a 
sequence of approximately 100 amino acids [23]. The SRCR 

15 domain of TADG-12 is encoded by amino acids 109 to 206 (SEQ ID 
No. 17), and this domain was aligned with other SRCR domains and 
found to have between 36 and 43% similarity (Figure 5B). 
However, TADG-12 only has 4 of the 6 conserved cysteine 
residues. This is similar to the SRCR domain found in the protease 

20 TMPRSS2. 

The TADG-12 protein also includes a serine protease 
domain of the trypsin family of proteases. An alignment of the 
catalytic domain of TADG-12 with other known proteases is shown 
in Figure 5C. The similarity among these sequence ranges from 4 8 
25 to 55%, and TADG-12 is most similar to the serine protease 
TMPRSS2 which also contains a transmembrane domain, LDLR-A 
domain and an SRCR domain. There is a conserved amino acid 
motif (RIVGG) downstream from the SRCR domain that is a 
potential cleavage/activation site common to many serine 
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proteases of this family [25]. This suggests that TADG-12 is 
trafficked to the cell surface where the ligand binding domains are 
capable of interacting with extracellular molecules and the 
protease domain is potentially activated. TADG-12 also contains 
5 conserved cysteine residues (amino acids 208 and 243) which in 
other proteases form a disulfide bond capable of linking the 
activated protease to the other extracellular domains. 

EXAMPLE 12 

1 0 Quantitative PGR Characterization of the Alternative Transcript 

The original TADG-12 subclone was identified as 
highly expressed in the initial redundant-primer PGR experiment. 
The TADG-12 variant form (TADG-12V) with the insertion of 133 
bp was also easily detected in the initial experiment. To identify 

15 the frequency of this expression and whether or not the 
expression level between normal ovary and ovarian tumors was 
different, a previously authenticated semi-quantitative PGR 
technique was employed [16]. The PGR analysis co-amplified a 
product for [3 -tubulin with either a product specific to TADG-12 or 

20 TADG-12V in the presence of a radiolabelled nucleotide. The 
products were separated by agarose gel electrophoresis and a 
phosphoimager was used to quantitate the relative abundance of 
each PGR product. Examples of these PGR amplification products 
are shown for both TADG-12 and TADG-12V in Figure 6. Normal 

25 expression was defined as the mean ratio of TADG-12 (or TADG- 
12V) to p-tubulin +/- 2SD as examined in normal ovarian samples. 
For tumor samples, overexpression was defined as >2SD from the 
normal TADG- 1 2/p-tubulin or TADG-12V/p-tubulin ratio. The 
results are summarized in Table 1 and Table 2. TADG-12 was 
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found to be overexpressed in 41 of 55 carcinomas examined while 
the variant form was present at aberrantly high levels in 8 of 2 2 
carcinomas. As determined by the student's t test, these 
differences were statistically significant (p < 0.05). 

5 

TABLE 1 

Frequency of Overexpression of TADG-12 in Ovarian Carcinoma 



Histology Type 


TADG-12 (%) 


Normal 


0/16 (0%) 


LMP-Serous 


3/6 (50%) 


LMP-Mucinous 


0/4 (0%) 


Serous Carcinoma 


23/29 (79%) 


Mucinous Carcinoma 


7/12 (58%) 


Endometrioid Carcinoma 


8/8 (100%) 


Clear Cell Carcinoma 


3/6 (50%) 


Benign Tumors 


3/4 (75%) 



10 Overexpression =more than two standard deviations above 

the mean for normal ovary 
LMP = low malignant potential tumor 
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TABLE 2 



Frequency of Overexpression of TADG-12V in Ovarian Carcinoma 



nistoiogy lype 


lALlij-lZV yvc) 


IN ormai 


A/1 n ^no3L\ 


J— > 1 VI x~OClUUd 




LMP-Mucinous 


0/3 (0%) 


Serous Carcinoma 


4/14 (29%) 


Mucinous Carcinoma 


3/5 (60%) 


Endometrioid Carcinoma 


1/3 (33%) 


Clear Cell Carcinoma 


N/D 



Overexpression =more than two standard deviatipns above 
5 the mean for normal ovary; LMP = low malignant potential tumor 

EXAMPLE 13 

Immunohistochemical Analvsis of TADG-12 in Ovarian Tumor Cells 
10 In order to examine the TADG-12 protein, polyclonal 

rabbit anti-sera to a peptide located in the carboxy-terminal 
amino acid sequence was developed. These antibodies were used 
to examine the expression level of the TADG-12 protein and its 
localization within normal ovary and ovarian tumor cells b y 
15 immuno-localization. No staining was observed in normal ovarian 
tissues (Figure 7A) while significant staining was observed in 2 2 
of 29 tumors studied. Representative tumor samples are shown in 
Figures 7B and 7C. It should be noted that TADG-12 is found in a 
diffuse pattern throughout the cytoplasm indicative of a protein in 
20 a trafficking pathway. TADG-12 is also found at the cell surface in 
these tumor samples as expected. It should be noted that the 
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10 



antibody developed and used for immunohistochemical analysis 
would not detect the TADG-12V truncated protein. 

The results of the immunohistochemical staining are 
summarized in Table 3. 22 of 29 ovarian tumors showed positive 
staining of TADG-12, whereas normal ovarian surface epithelium 
showed no expression of the TADG-12 antigen. 8 of 10 serous 
adenocarcinomas, 8 of 8 mucinous adenocarcinomas, 1 of 2 clear 
cell carcinomas, and 4 of 6 endometroid carcinomas showed 
positive staining. 



TABLE 3 



Case 


Stage 


Histology 


Grade 


LN' 


TADG12 


Prognosis 


1 




Normal ovary 






0- 




2 




Normal ovary 






0- 




3 




Normal ovary 






0- 




4 




Mucinous B 




ND 


0- 


Alive 


5 




Mucinous B 




ND 


1+ 


Alive 


6 


1 a 


Serous LMP 


Gl 


ND 


1+ 


Alive 


7 


1 a 


Mucinous LMP 


Gl 


ND 


1+ 


Alive 


8 


1 a 


Mucinous CA 


Gl 


ND 


1+ 


Alive 


9 


1 a 


Mucinous CA 


G2 


ND 


1+ 


Alive 


1 0 


1 a 


Endometrioid CA 


Gl 


ND 


0- 


Alive 


1 1 


1 c 


Serous CA 


Gl 


N 


1+ 


Alive 


1 2 


1 c 


Mucinous CA 


Gl 


N 


1+ 


Alive 


1 3 


1 c 


Mucinous CA 


Gl 


N 


2+ 


Alive 


1 4 


1 c 


Clear cell CA 


G2 


N 


0- 


Alive 


1 5 


1 c 


Clear cell CA 


G2 


N 


0- 


Alive 


1 6 


2c 


Serous CA 


G3 


N 


2+ 


Alive 


1 7 


3a 


Mucinous CA 


G2 


N 


2+ 


Alive 
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1 8 


3b 


Serous CA 




Gl 


ND 


1+ 


Alive 


1 9 


3c 


Serous CA 




Gl 


N 


0- 


Dead 


20 


3c 


Serous CA 




G3 


P 


1 + 


Alive 


2 1 


3c 


Serous CA 




G2 


P 


2+ 


Alive 


22 


3c 


Serous CA 




Gl 


P 


2+ 


Unknown 


23 


3c 


Serous CA 




G3 


ND 


2+ 


Alive 


24 


3c 


Serous CA 




G2 


N 


0- 


Dead 


25 


3c 


Mucinous CA 




Gl 


P 


2+ 


Dead 


26 


3c 


Mucinous CA 




G2 


ND 


1+ 


Unknown 


27 


3c 


Mucinous CA 




G2 


N 


1+ 


Alive 


28 


3c 


Endometrioid 


CA 


Gl 


P 


1 + 


Dead 


29 


3c 


Endometrioid 


CA 


G2 


N 


0- 


Alive 


30 


3c 


Endometrioid 


CA 


G2 


P 


1+ 


Dead 


3 1 


3c 


Endometrioid 


CA 


G3 


P 


1+ 


Alive 


32 


3c 


Clear Cell CA 




G3 


P 


2+ 


Dead 



LN*= Lymph Node: B = Benign; N = Negative; P = Positive; 



ND = Not Done 



5 EXAMPLE 14 

Peptide Ranking 

For vaccine or immune stimulation, individual 9-mers 
to 11-mers of the TADG-12 protein were examined to rank the 
binding of individual peptides to the top 8 haplotypes in the 
10 general population [Parker et al., (1994)]. The computer program 
used for this analysis can be found at <http://www- 
bimas.dcrt.nih.gov/molbio/hla_bind/>. Table 4 shows the peptide 
ranking based upon the predicted half-life of each peptide's 
binding to a particular HLA allele. A larger half-life indicates a 
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Stronger association with that peptide and the particular HLA 
molecule. The TADG-12 peptides that strongly bind to an HLA 
allele are putative immunogens, and are used to innoculate an 
individual against TADG-12. 



TABLE 4 





TADG-12 peptide 


ranking 










HLA Type 






Predicted 


SEC 




& Ranking 


Start 


Peptide 


Dissociationi 


ID 


10 


HLA A0201 












1 


40 


ILSLLPFEV 


685.783 


35 




2 


144 


AQLGFPSYV 


545.316 


36 




3 


225 


LLSQWPWQA 


63.342 .' 


37 




4 


252 


WIITAAHCV 


43.992 


38 


15 


5 


356 


VLMHAAVPL 


36.316 


39 




6 


176 


LLPDDKVTA 


34.627 


40 




7 


1 3 


FSFRSLFGL 


31.661 


41 




8 


1 5 1 


YVSSDNLRV 


27.995 


42 




9 


436 


RVTSFLDWI 


21.502 


43 


20 


1 0 


234 


SLQFQGYHL 


21.362 


44 




1 1 


181 


KVTALHHSV 


21.300 


45 




1 2 


183 


TALHHSVYV 


19.658 


46 




1 3 


41 1 


RLWKLVGAT 


18.494 


47 




1 4 


60 


LILALAIGL 


18.476 


48 


25 


1 5 


227 


SQWPWQASL 


17.977 


49 




1 6 


301 


RLGNDIAIiM 


1 1.426 


50 




1 7 


307 


ALMKLAGPL 


10.275 


51 




1 8 


262 


DLYLPKSWT 


9.837 


52 




1 9 


4 1 6 


LVGATSFGI 


9.001 


53 


30 


20 


54 


SLGIIATpTTp 


8.759 


54 
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HLA A0205 



1 218 

2 60 

3 35 

5 4 307 

5 27 1 

6 397 

7 227 

8 270 
10 9 5 6 

10 110 

11 181 

12 151 
1 3 356 

15 14 144 

15 13 

16 5 4 

1 7 234 

18 217 

20 19 411 

20 252 
HLA Al 

1 130 

2 8 
25 3 328 

4 3 

5 98 

6 346 

7 360 



IVGGNMSLL 47.600 55 

LILALAIGL 35.700 48 

AVAAQILSL 28.000 56 

ALMKLAGPL 21.000 51 

IQVGLVSLL 19.040 57 

CQGDSGGPL 16.800 58 

SQWPWQASL 16.800 49 

TIQVGLVSL 14.000 59 

GIIALILAL 14.000 60 

RVGGQNAVL 14.000 61 

KVTALHHSV 12.000 45 

YVSSDNLRV 12.000 42 

VLNHAAVPL 11.900 39 

AQLGFPSYV 9.600 36 

FSFRSLFGL 7.560 41 

SLGIIALIL 7.000 54 

SLQFQGYHL 7.000 44 

RIVGGNMSL 7.000 62 

RLWKLVGAT 6.000 47 

WIITAAHCV 6.000 38 

CSDDWKGHY 37.500 63 

AVEAPFSFR 9.000 64 

NSEENFPDG 2.700 65 

ENDPPAVEA 2.500 66 

DCKDGEDEY 2.500 67 

ATEDGGDAS 2.250 68 

AAVPLISNK 2.000 69 
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8 153 

9 182 

10 143 
1 1 259 

5 12 369 

1 3 278 

1 4 426 

15 32 

1 6 406 

10 17 329 

1 8 303 

19 1 27 

20 440 
HLA A24 

15 1 433 

2 263 

3 169 

4 217 

5 296 
20 6 1 6 

7 267 

8 8 1 

9 375 

10 110 
25 11 189 

.1 2 6 0 

13 165 

14 271 

15 56 



SSDNIiRVSS 1.500 70 

VTALHHSVY 1.250 71 

CAQLGFPSY 1.000 72 

CVYDLYLPK 1.000 73 

ICNHRDVYG 1.000 74 

LLDNPAPSH 1.000 75 

CAEVNKPGV 1.000 76 

DADAVAAQI 1.000 77 

VCQERRLWK 1.000 78 

SEENFPDGK 0.900 79 

GNDIALMKL 0.625 80 

KTMCSDDWK 0.500 81 

FLDWIHEQM 0.500 82 

VYTRVTSFL 280.000 83 

LYLPKSWTI 90.000 84 

EFVSIDHLL 42.000 85 

RIVGGNMSL 12.000 62 

KYKPKRLGN 12.000 86 

RSLFGLDDL 12.000 87 

KSWTIQVGL 11.200 88 

RSSFKCIEL 8.800 89 

VYGGIISPS 8.000 90 

RVGGQNAVL 8.000 91 

VYVREGCAS 7.500 92 

LILALAIGL 7.200 48 

QFREEFVSI 7.200 93 

IQVGLVSLL 7.200 57 

GIIALILAL 7.200 60 
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1 6 
1 7 

1 8 
1 9 

5 20 
HLA B7 
1 
2 
3 

10 4 

5 
6 
7 
8 

15 9 

1 0 
1 1 
1 2 
1 3 

20 1 4 

1 5 
1 6 
1 7 
1 8 

25 1 9 

20 
HLA B8 
1 
2 



1 0 

307 

407 

356 

381 



375 

38 1 

362 

35 

373 

307 

283 

1 77 

47 

1 10 

218 

36 

255 

1 0 

138 

1 95 

215 

298 

3 1 3 

108 



EAPFSFRSL 



AliMKLAGPL 



CQERRLWKL 



VIjNHAAVPL 



SPSMLCAGY 



VYGGIISPS 



SPSMLCAGY 



VPLISNKIC 



AVAAQILSL 
RDVYGGIIS 



ALMKLAGPL 



APSHLVEKI 



LPDDKVTAL 



EVFSQSSSL 
RVGGQNAVL 



IVGGNMSLL 



VAAQILSLL 
TAAHCVYDL 



EAPFSFRSL 



YANVACAQL 
CASGHWTL 



SSRIVGGNM 



KPKRLGNDI 



GPLTFNEMI 



CVRVGGQNA 



7.200 
7.200 
6.600 
6.000 
6.000 



200.000 

80.000 

80.000 

60.000 

40.000 

36.000 

24.000 

24.000 

20.000 

20.000 

20.000 

12.000 

12.000 

12.000 

12.000 

12.000 

10.00 

8.000 

8.000 

5.000 



94 
51 
95 
39 
96 



97 
98 
99 
56 
100 
51 
101 
102 
103 
91 
55 
104 
105 
94 
106 
107 
108 
109 
1 10 
1 1 1 



294 
373 



HSKYKPKRL 



RDVYGGIIS 



80.000 
16.000 



1 12 
100 
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4 
5 
6 

5 7 

8 
9 

1 0 
1 1 

10 1 2 

1 3 
1 4 
1 5 
1 6 

15 1 7 

1 8 
1 9 
20 

HLA B2702 
20 1 

2 
3 
4 
5 

25 6 

7 
8 
9 

1 0 




177 

265 

88 

298 

8 1 

375 

79 

1 0 

215 

36 

255 

381 

195 

362 

138 

207 

154 

47 



300 

435 

376 

410 

210 

227 

109 

191 

7 8 

1 1 3 



LPDDKVTAIi 



LPKSVJTIQV 



ElilTRCDGV 



KPKRLGNDI 



RSSFKCIEL 



VYGGIISPS 



RCRSSFKCI 



EAPFSFRSL 



SSRIVGGNM 



VAAQILSLL 



TAAHCVYDL 



SPSMLCAGY 



CASGHWTL 



VPLISNKIC 



YANVACAQL 
ACGHRRGYS 



SDNLRVSSL 



EVFSQSSSL 



KRLGNDIAL 



TRVTSFLDW 



YGGIISPSM 



RRLWKLVGA 



HRRGYSSRI 



SQWPWQASL 
VRVGGQNAV 



VREGCASGH 



YRCRSSFKC 



GQNAVLQVF 



4.800 
2.400 
2.400 
2.000 
2.000 
2.000 
2.000 
1.600 
1.000 
0.800 
0.800 
0.800 
0.800 
0.800 
0.800 
0.400 
0.400 
0.400 



PCT/US00/0S612 

102 

1 13 

114 

109 

89 

97 

115 

94 

108 

1 04 

116 

98 

107 

99 

106 

1 17 

1 1 8 

103 



180.000 1 19 
100.000 120 
100.000 121 



60.000 
60.000 
30.000 
20.000 
20.000 
20.000 
20.000 



122 

123 

49 

124 

125 

126 

127 
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PCTAJS0O/OS612 



1 1 
1 2 
1 3 

1 4 

5 1 5 

1 6 
1 7 
1 8 
1 9 

10 20 

HLA B4403 
1 
2 
3 

15 4 

5 
6 
7 
8 

20 9 

1 0 
1 1 
1 2 
1 3 

25 1 4 

1 5 
1 6 
1 7 
1 8 



9 1 

38 

21 1 

216 

1 1 8 

370 

393 

235 

271 

408 



427 
1 62 
9 

3 1 8 

256 

98 

46 

3 8 

64 

192 

330 

182 

408 

206 

5 

261 
33 
1 68 



TRCDGVSDC 



AQILSLLPF 



RRGYSSRIV 



SRIVGGNMS 



LQVFTAASW 



CNHRDVYGG 



GVDSCQGDS 
LQFQGYHLC 
IQVGLVSLL 
CQERRLWKL 



AEVNKPGVY 



LEGQFREEF 



VEAPFSFRS 



NEMIQPVCL 
AAHCVYDLY 



DCKDGEDEY 



FEVFSQSSS 
AQILSLLPF 



LAIGLGIHF 



REGCASGHV 



EENFPDGKV 



VTALHHSVY 



QERRLWKLV 



TACGHRRGY 



DPPAVEAPF 



YDLYLPKSW 



ADAVAAQIL 



EEFVSIDHL 



20.000 

20.000 

18.000 

10.000 

10.000 

10.000 

10.000 

10.000 

6.000 

6.000 



90.000 

40.000 

24.000 

12.000 

9.000 

9.000 

8.000 

7.500 

7.500 

6.000 

6.000 

6.000 

6.000 

4.500 

4.500 

4.500 

4.500 

4.000 



128 

129 

130 

131 

132 

133 

134 

135 

57 

95 



136 

1 37 

138 

1 39 

140 

67 

141 

129 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 
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19 304 NDIALMKLA 3.750 152 

2 0 104 DEYRCVRVG 3.600 153 



5 Conclusion 

In this study, a serine protease was identified by 
means of a PGR based strategy. By Northern blot, the largest 
transcript for this gene is approximately 2.4 kb, and it is found to 
be expressed at high levels in ovarian tumors while found at 

10 minimal levels in all other tissues examined. The full-length cDNA 
encoding a novel multi-domain, cell-surface serine protease was 
cloned, named TADG-12. The 454 amino acid protein contains a 
cytoplasmic domain, a type II transmembrane domain, an LDLR-A 
domain, an SRCR domain and a serine protease domain. Using a 

15 semi-quantitative PGR analysis, it was shown that TADG-12 is 
overexpressed in a majority of tumors studied. 
Immunohistochemical staining corroborates that in some cases 
this protein is localized to the cell-surface of tumor cells and this 
suggests that TADG-12 has some extracellular proteolytic 

20 functions. Interestingly, TADG-12 also has a variant splicing form 
that is present in 35% of the tumors studied. This variant mRNA 
would lead to a truncated protein that may provide a unique 
peptide sequence on the surface of tumor cells. 

This protein contains two extracellular domains which 

25 might confer unusual properties to this multidomain molecule. 
Although the precise role of LDLR-A function with regard to 
proteases remains unclear, this domain certainly has the capacity 
to bind calcium and other positively charged ligands [21,22]. This 
may play an important role in the regulation of the protease or 
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subsequent internalization of the molecule. The SRC31 domain was 
originally identified within the macrophage scavenger receptor 
and functionally described to bind lipoproteins. Not only are SRCR 
domains capable of binding lipoproteins, but they may also bind to 
5 molecules as diverse as polynucleotides [23]. More recent studies 
have identified members of this domain family in proteins with 
functions that vary from proteases to cell adhesion molecules 
involved in maturation of the immune system [24]. In addition, 
TADG-12, like TMPRSS2 has only four of six cysteine residues 
10 conserved within its SRCR domain. This difference may allow for 
different structural features of these domains that confer unusual 
ligand binding properties. At this time, only the function of the 
CD6 encoded SRCR is well documented. In the case of CD6, the 
SRCR domain binds to the cell adhesion molecule ALCAM [23]. 
15 This mediation of cell adhesion is a useful starting point for future 
research on newly identified SRCR domains, however, the 
possibility of multiple functions for this domain can not be 
overlooked. SRCR domains are certainly capable of cell adhesion 
type interactions, but their capacity to bind other types of ligands 
20 should be considered. 

At this time, the precise role of TADG-12 remains 
unclear. Substrates have not been identified for the protease 
domain, nor have ligands been identified for the extracellular 
LDLR-A and SRCR domains. Figure 8 presents a working model of 
25 TADG-12 with the information disclosed in the present invention. 
Two transcripts are produced which lead to the production of 
either TADG-12 or the truncated TADG-12V proteins. Either of 
these proteins is potentially targeted to the cell surface. TADG-12 
is capable of becoming an activated serine protease while TADG- 
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12V is a truncated protein product that if at the cell surface may 
represent a tumor specific epitope. 

The problem with treatment of ovarian cancer today 
remains the inability to diagnose the disease at an early stage. 
5 Identifying genes that are expressed early in the disease process 
such as proteases that are essential for tumor cell growth [26] is 
an important step toward improving treatment. With this 
knowledge, it may be possible to design assays to detect the 
highly expressed genes such as the TADG-12 protease described 

10 here or previously described proteases to diagnose these cancers 
at an earlier stage. Panels of markers may also provide prognostic 
information and could lead to therapeutic strategies for individual 
patients. Alternatively, inhibition of enzymes such as proteases 
may be an effective means for slowing progression of ovarian 

15 cancer and improving the quality of patient life. Other features of 
TADG-12 and TADG-12V must be considered important to future 
research too. The extracellular ligand binding domains are natural 
targets for drug delivery systems. The aberrant peptide 
associated with the TADG-12V protein may provide an excellent 

20 target drug delivery or for immune stimulation. 
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Any patents or publications mentioned in this 
specification are indicative of the levels of those skilled in the art 
to which the invention pertains. These patents and publications 
are herein incorporated by reference to the same extent as if each 
5 individual publication was specifically and individually indicated 
to be incorporated by reference. 

One skilled in the art will readily appreciate that the 
present invention is well adapted to carry out the objects and 
obtain the ends and advantages mentioned, as well as those 

10 inherent therein. The present examples along with the methods, 
procedures, treatments, molecules, and specific compounds 
described herein are presently representative of preferred 
embodiments, are exemplary, and are not intended as limitations 
on the scope of the invention. Changes therein and other uses will 

15 occur to those skilled in the art which are encompassed within the 
spirit of the invention as defined by the scope of the claims. 
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WHAT IS CLAIMED IS: 

1. A DNA fragment encoding Tumor Associated 
Differentially-Expressed Gene-12 (TADG-12) protein selected from 

5 the group consisting of; 

(a) an isolated DNA fragment which encodes a 
TADG-12 protein; 

(b) an isolated DNA fragment which hybridizes to 
isolated DNA fragment of (a) above and which encodes a TADG-12 

10 protein; and 

(c) an isolated DNA fragment differing from the 
isolated DNA fragments of (a) and (b) above in codon sequence 
due to the degeneracy of the genetic code, and which encodes a 
TADG-12 protein. 

15 

2. The DNA fragment of claim 1, wherein said DNA 
fragment has the sequence selected from the group consisting of 
SEQ ID No. 1 and SEQ ID No, 3. 

20 3. The DNA fragment of claim 1, wherein said 

TADG-12 protein has the amino acid sequence selected from the 
group consisting of SEQ ID No. 2 and SEQ ID No. 4. 

4. A vector comprising the DNA fragment of claim 1 
25 and regulatory elements necessary for expression of the DNA in a 

cell. 

5. The vector of claim 4, wherein said DNA 
fragment encodes a TADG-12 protein having the amino acid 
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sequence selected from the group consisting of SEQ ID No. 2 and 
SEQ ID No. 4. 

6. A host cell transfected with the vector of claim 4, 
5 said vector expressing a TADG-12 protein. 

7. The host cell of claim 6, wherein said cell is 
selected from the group consisting of a bacterial cell, a mammalian 
cell, a plant cell and an insect cell. 

10 

8. The host cell of claim 7, wherein said bacterial 
cell is E. coll. 

9. An antisense oligonucleotide directed against the 
1 5 DN A fragment of claim 1 . 

10. An isolated and purified TADG-12 protein coded 
for by DNA selected from the group consisting of: 

(a) isolated DNA which encodes a TADG-12 protein; 
20 (b) isolated DNA which hybridizes to isolated DNA of 

(a) above and which encodes a TADG-12 protein; and 

(c) isolated DNA differing from the isolated DNAs of 
(a) and (b) above in codon sequence due to the degeneracy of the 
genetic code, and which encodes a TADG-12 protein. 

25 

11. The isolated and purified TADG-12 protein of 
claim 10, wherein said TADG-12 protein has an amino acid 
sequence selected from the group consisting of SEQ ID No. 2 and 
SEQ ID No. 4. 
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12. A method for detecting expression of the TADG- 
12 protein of claim 10, comprising the steps of: 

(a) contacting mRNA obtained from a cell with a 
labeled hybridization probe; and 

(b) detecting hybridization of the probe with the 

mRNA. 



13. An antibody directed against the TADG-12 
protein of claim 10. 

10 

14. A method for diagnosing a cancer in an 
individual, comprising the steps of: 

(a) obtaining a biological sample from said 
individual; and 

15 (b) detecting a TADG-12 protein in said sample, 

wherein the presence of a TADG-12 protein in said sample is 
indicative of the presence of a cancer in said individual, wherein 
the absence of a TADG-12 protein in said sample is indicative of 
the absence of a cancer in said individual. 

20 

15. The method of claim 14, wherein said biological 
sample is selected from the group consisting of blood, urine, saliva, 
tears, interstitial fluid, ascites fluid, tumor tissue biopsy and 
circulating tumor cells. 

25 

16. The method of claim 14, wherein said detection 
of a TADG-12 protein is by means selected from the group 
consisting of Northern blot. Western blot, PGR, dot blot, ELIZA 
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sandwich assay, radioimmunoassay, DNA array chips and flow 
cytometry. 



17. The method of claim 14, wherein said cancer is 
5 selected from the group consisting of ovarian cancer, breast 

cancer, lung cancer, colon cancer, prostate cancer and other 
cancers in which TADG-12 is overexpressed. 

18. A method for detecting malignant hyperplasia in 
10 a biological sample, comprising the steps of: 

(a) isolating mRNA from said sample; and 

(b) detecting TADG-12 mRNA in said sample, 

* 

wherein the presence of said TADG-12 mRNA in said sample is 
indicative of the presence of malignant hyperplasia, wherein the 
15 absence of said TADG-12 mRNA in said sample is indicative of the 
absence of malignant hyperplasia. 

19. The method of claim 18, further comprising the 
step of comparing said TADG-12 mRNA to reference information, 

20 wherein said comparison provides a diagnosis of said malignant 
hyperplasia. 

20. The method of claim 18, further comprising the 
step of comparing said TADG-12 mRNA to reference information, 

25 wherein said comparison determines a treatment of said 
malignant hyperplasia. 

21. The method of claim 18, wherein said detection 
of TADG-12 mRNA is by PGR amplification. 
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22, The method of claim 21, wherein said PGR 

amplification uses primers selected from the group consisting of 
SEQ ID Nos. 28-31. 



5 23, The method of claim 18, wherein said biological 

sample is selected from the group consisting of blood, urine, saliva, 
tears, interstitial fluid, ascites fluid, tumor tissue biopsy and 
circulating tumor cells. 



10 24. A method for detecting malignant hyperplasia in 

a biological sample, comprising the steps of: 

(a) isolating protein from said sample; and 

(b) detecting a TADG-12 protein in said sample, 
wherein the presence of a TADG-12 protein in said sample is 

15 indicative of the presence of malignant hyperplasia, wherein the 
absence of a TADG-12 protein in said sample is indicative of the 
absence of malignant hyperplasia. 

25. The method of claim 24, further comprising the 
20 step of comparing said TADG-12 protein to reference information, 

wherein said comparison provides a diagnosis of said malignant 
hyperplasia. 

26. The method of claim 24, further comprising the 
25 step of comparing said TADG-12 protein to reference information, 

wherein said comparison determines a treatment of said 
malignant hyperplasia. 
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27. The method of claim 24, wherein said detection 
is by immunoaffinity to an antibody, wherein said antibody is 
directed against a TADG-12 protein. 

5 28. The method of claim 24, wherein said biological 

sample is selected from the group consisting of blood, urine, saliva, 
tears, interstitial fluid, ascites fluid, tumor tissue biopsy and 
circulating tumor cells. 

10 29. A method of inhibiting expression of endogenous 

TADG-12 mRNA in a cell, comprising the step of: 

introducing a vector into a cell, wherein said vector 
comprises a DNA fragment of TADG-12 in opposite orientation 
operably linked to elements necessary for expression, wherein 

15 expression of said vector in said cell produces TADG-12 antisense 
mRNA, wherein said TADG-12 antisense mRNA hybridizes to 
endogenous TADG-12 mRNA, thereby inhibiting expression of 
endogenous TADG-12 mRNA in said cell. 

20 30. A method of inhibiting expression of a TADG-12 

protein in a cell, comprising the step of: 

introducing an antibody into a cell, wherein said 
antibody is directed against a TADG-12 protein or fragment 
thereof, wherein binding of said antibody to said TADG-12 protein 
25 or fragment thereof inhibits expression of said TADG-12 protein. 

31. A method of targeted therapy to an individual, 
comprising the step of: 
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administering a compound to an individual, wherein 
said compound has a targeting moiety and a therapeutic moiety, 
wherein said targeting moiety is specific for a TADG-12 protein. 

5 32. The method of claim 31, wherein said targeting 

moiety is selected from the group consisting of an antibody 
directed against a TADG-12 protein and a ligand or ligand binding 
domain that binds a TADG-12 protein. 

10 33. The method of claim 32, wherein said TADG-12 

protein has an amino acid sequence selected from the group 
consisting of SEQ ID No. 2 and SEQ ID No. 4. 

34. The method of claim 31, wherein said 
15 therapeutic moiety is selected from the group consisting of a 

radioisotope, a toxin, a chemotherapeutic agent, an immune 
stimulant and a cytotoxic agent. 

35. The method of claim 31, wherein said individual 
20 suffers from a disease selected from the group consisting of 

ovarian cancer, lung cancer, prostate cancer, colon cancer and 
other cancers in which TADG-12 is overexpressed. 

3 6. A method of vaccinating an individual against 
25 TADG-12, comprising the step of inoculating the individual with a 
TADG-12 protein or fragment thereof, wherein said TADG-12 
protein or fragment thereof lacks TADG-12 activity, wherein said 
inoculation with said TADG-12 protein or fragment thereof elicits 
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an immune response in said individual, thereby vaccinating said 
individual against TADG-12. 



37, The method of claim 36, wherein said individual 
5 has a cancer, is suspected of having a cancer or is at risk of getting 
a cancer. 

3 8. The method of claim 36, wherein said TADG-12 
protein has an amino acid sequence selected from the group 
consisting of SEQ ID No. 2 and SEQ ID No. 4. 

10 

39. The method of claim 36, wherein said TADG-12 
fragment has a sequence shown in SEQ ID No. 8. 

40. The method of claim 36, wherein said TADG-12 
15 fragment is a 9-residue fragment selected from the group 

consisting of SEQ ID Nos. 35, 36, 55, 56, 83, 84, 97, 98, 119, 120, 
122, 123 and 136. 

41. An immunogenic composition, comprising an 
20 immunogenic fragment of a TADG-12 protein and an appropriate 

adjuvant. 

42. The immunogenic composition of claim 41, 
wherein said immunogenic fragment of a TADG-12 protein has a 
sequence shown in SEQ ID No. 8. 

25 

43. The immunogenic composition of claim 41, 
wherein said immunogenic fragment of a TADG-12 protein is a 9 - 
residue fragment selected from the group consisting of SEQ ID Nos. 
35, 36, 55, 56, 83, 84, 97, 98, 119, 120, 122, 123 and 136. 
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FIG. 1A 



TADG12 



I 



1 TGGGTGGTGACGGCGGCGCACTGTGTTTATGACTTGTACCTCCCCAAGTCATGGACCATC 
W V V T A A (jT) CVYDLYLPKSWTI 

61 CAGGTGGGTCTAGTTTCCCTGTTGGACAATCCAGCCCCATCCCACTTGGTGGAGAAGATT 
QVG LV S LLDN PA P S H LVE K I 

( SEQ XD NO . 5 ) 

121 GTCTACCACAGCAAGTACAAGCCAAAGAGGCTGGGCAACGACATCGCCCTCCTA 

VY H S KY K PKRL GN (d) I A L L 

(SEQ ID NO. 6 ) 



TADG12-V 



1 GGGTGGTGACGGCGGCGCACTGTGTTTATG AGATTGTAGCTCCTAGAGAAAGGGCAGACA 
VVTAAHCVYE IVAPRERADR 

61 GAAGAGGAAGGAAGCTCCTGTGCTGGAGGAAACCCACAAAAATGAAAGGACCTAGACCTT 
RGRKLIiCWRKPTKMKGPRPS 

121 CCCATAGCTAATTCCAGTGGACCATGTTATGGCAGATACAGG C T TGTACC TCCCCAAGTC 
^ g Z (SEQ ID NO. 8 ) 

181 ATGGACCATCCAGGTGGGTCTAGTTTCCCTGTTGGACAATCCAGCCCCATCCCACTTGGT 
241 GGAGAAGATTGTCTACCACAGCAAGTACAAGCCAAAGAGGCTGGGCAACGACATCGCCCT 
301 CCTAATCACTAGTGCGGCCGCCTGCAGG (SEQ ID NO. 7) 

FIG. 1B 
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73 



1 CGGGAAAGGGCTGTGTTTATGGGAAGCCAGTAACACTGTGGCCTACTATCTCTTCCGTGG 
61 TGCCATCTACATTTTTGGGACTCGGGAATTATGAGGTAGAGGTGGAGGCGGAGCCGGATG 
121 TCAGAGGTCCTGAAATAGTCACCATGGGGGAAAATGATCCGCCTGCTGTTGAAGCCCCCT 

M G ENDPPAVEAPF13 
181 TCTCATTCCGATCGCTTTTTGGCCTTGATGATTTGAAAATAAGTCCTGTTGCACCAGATG 

SFRSLFGLCDLKISPVAPDA33 
241 CAGATGCTGTTGCTGCACAGATCCTGTCACTGCTGCCATTTGAAGTTTTTTCCCAATCAT 

DAVAAQILSLLPFEVF.S PO P S 53 
301 rr;TCATTGGGGATCATT GCATTGATATTAGCACTGGCCATTGGTCTG GGCATCCACTTCG 
I s L G IIALILALAIGlI g I H F D 

3 6 1 actgctcagggaagt acagatgtcgctcatcctttaagtgtatcgagctgataactcgat 

c sgkyrcrssfkci elit R C 93 
421 gtgacggagtctcggattgcaaagacggggaggacgagtaccgctgtgtccgggtgggtg 

DGVSDCKDGEDEY R . . C V__R V G G 113 

4 8 1 gtcagaatgccgtgctccaggtgttcacagctgcttcgtggaagaccatgtgctccgatg 

onavlqvf taaswktmcsdd 133 

5 41"XCTGGAAGGGTCACTACGCAAATGTTGCCTGTGCCCAACTGGGTTTCCCAAG 

WKGHYANVACAQL G F P S Y V S 15 3 

60i'"gTtcagat^^^^ 

sdnlrvsslegqfree F V S I 17 3 

661 "TCG AT CAC cTcTf CAG ATG ACAAGGT G ACT GCAT TAG ACC ACT C AGT AT AT GTG AG GG 

DHLLPDDKVTALHHSV Y R E 193 

7 2 1 ""AGGG AT GT G 

GCASGHVVTLQCTACGHRRG 213 
7 81 GCYaC AGC'^^^^^^^ 

Y S S ^ IVGGNMSLLSQWPWQA 233 
841 CCAGCCTTCAGTTCCAGGGCTACCACCTGTGCGGGGGCTCTGTCATCACGCCCCTGTGGA 

SliQFQGYHLCGGSVITPLWI 253 
901 TCATCACTGCTGCACACTGTGTTTATGACTTGTACCTCCCCAAGTCATGGACCATCCAGG 

I T A iO^^ C VYDLYLPKSWTIQV 273 
961 TGGGTCTAGTTTTCCTGTTGGACAATCCAGCCCCATCCCACTTGGTGGAGAAGATTGTCT 

GLVSLLDN PAPSHLVEKIVY 293 
1021 ACCACAGCAAGTACAAGCCAAAGAGGCTGGGCAATGACATCGCCCTTATGAAGCTGGCCG 

HSKYKPKRL gQ DIALMKLAG 313 
1081 GGCCACTCACGTTCAATGAAATGATCCAGCCrTGTGTGCCTGCCCAACTCTGAAGAGAACT 

PLTFNEMIQPVCLPNSEENF 333 
114 1 TCCCCGATGGAAAAGTGTGCTGGACGTCAGGATGGGGGGCCACAGAGGATGGAGGTGACG 

P DGKVCWTSGWGATEDGGDA 353 
12 01 CCTCCCCTGTCCTGAACCACGCGGCCGTCCCTTTGATTTCCAACAAGATCTGCAACCACA 

S PVLNHAAVPLISNKICNHR 373 

12 61 GGGACGTGTACGGTGGCATCATCTCCCCCTCCATGCTCTGCGCGGGCTACCTGACGGGTG 

DVYGGI I S P SMLCAGYLTGG 393 
1321 GCGTGGACAGCTGCCAGGGGGACAGCGGGGGGCCCCTGGTGTGTCAAGAGAGGAGGCTGT 

V D S C Q G (d) SGGPLVCQERRLW 413 

13 81 ggaagttagtgggagcgaccJtoctttggcatcggctgcgcagaggtgaacaagcctgggg 

KLVGATSFGIGCAEVNKPGV 433 

14 41 tgtacacccgtgtcacctccttcctggactggatccacgagcagatggagagagacctaa 

YTRVTSFLDWIHEQMERDLK 453 

15 01 aaacctgaagaggaaggggacaagtagccacctgagttcctgaggtgatgaagacagccc 

T * (SEQ id no. 2) 45 4 

15 61 GATCCTCCCCTGGACTCCCGTGTAGGAACCTGCACACGAGCAGACACCCTTGGAGCTCTG 

1621 AGTTCCGGCACCAGTAGCGGGCCCGAAAGAGGCACCCTTCCATCTGATTCCAGCACAACC 

1681 TTCAAGCTGCTTTTTGTTTTTTGTTTTTTTGAGGTGGAGTCTCGCTCTGTTGCCCAGGCT 

17 4 1 GGAGTGCAGTGGCGAAATACCCTGCTCACTGCAGCCTCCGCTTCCCTGGTTCAAGCGATT 

18 01 CTCTTGCCTCAGCTTCCCCAGTAGCTGGGACCACAGGTGCCCGCCACCACACCCAACTAA 
18 61 TTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGCTCTCAAACCCC 
1921 TGACCTCAAATGATGTGCCTGCTTCAGCCTCCCACAGTGCTGGGATTACAGGCATGGGCC 
1981 ACCACGCCTAGCCTCACGCTCCTTTCTGATCTTCACTAAGTVACAAAAGAAGCAGCAACTT 
2 04 1 GCAAGGGCGGCCTTTCCCACTGGTCCATCTGGTTTTCTCTCCAGGGTCTTGCAAAATTCC 
2101 TGACGAGATAAGCAGTTATGTGACCTCACGTGCAAAGCCACCAACAGCCACTCAGAAAAG 
2161 ACGCACCAGCCCAGAAGTGCAG7VACTGCAGTCACTGCACGTTTTCATCTTTAGGGACCAG 
22 21 AACCAAACCCACCCTTTCTACTTCCAAGACTTATTTTCACATGTGGGGAGGTTAATCTAG 
22 81 GAATGACTCGTTTAAGGCCTATTTTCATGATTTCTTTGTAGCATTTGGTGCTTGACGTAT 
2 3 4 1 TATTGTCCTTTGAT TCCAAAT AATATGTTTCCTTCCCTCAAAAAAAAAAAAAAAAAAAAA 
2 4 01 AAAAAAAAAAAAA (SEQ ID NO. 1) 

FIG. 4 



4/9 



wo 00/52044 



PCT/USOO/05612 



CompcS 
Matr 
Gp300-1 
Gp300-2 
TAD612 
Tmprss2 
Cons 



CEG. .FVC 
CPG . QFTC 
CQQGYFKC 
CSSHQITC 
CSGK . YRC 
CSNS6IEC 
C C 



AQTGRCVNRR 
. RTGRCIRKE 

QSEGQCIPSS 
. SKGQCIPSE 

RSSFKCXEXil 

DSSGTCIHPS 
C 



LLCM6DMDCG 
IJICDGWADCT 
WVCDQDQDCD 
YRCDHVRDCP 
TRCDGVSDCK 
NWCDGVSHCP 
C C 



DQSDEAM . C 


(SEQ 


ZD 


NO. 


9 ) 


DHSDELK . C 


(SEQ 


XD 


NO. 


10 ) 


DGSDERQDC 


(SEQ 


ID 


NO. 


11) 


DGADE.NDC 


(SEQ 


ZD 


NO. 


12 ) 


DGEDEYR . C 


(SEQ 


XD 


NO. 


13 ) 


GGEDENR . C 


(SEQ 


ZD 


NO. 


14 ) 


DE C 











FIG. 5A 



BovEntk VRI.VGGSGPH 

MacSR VRIiVGGSGPH 

TADG12 VRVGG . . . QN 
Tmprss2 ' VRLYG. . . PN 

HumEntk VRFFNGTTNN 

Cons VR 

BovEntk VHKRAYFGKG 

MacSR VHKAAHFGQG 

TADG12 SSDNIiRVSSL 

Tmprss2 SSQGIVDDSG 

HumEntk NSSKPXFSTD 
Cons 



EGRVEX.FHE GQWGTVCDDR 
EGRVEI.LHS GQWGTICDDR 
AVLQVFTA. . ASWKTMCSDD 
FIIiQMYSSQR KSWHPVCQDD 
NGLVRFRXQ . S IWHTACAEN 

W C 

TGPIWIiNEVF CFGK. .ESSX 
TGPIWIiNEVF CFGR, .ESSX 
EGQFREEFVS X.DHLLPDDK 
STSFMKIiNTS A.GNV. . .DX 
GGPFVKIiNTA PDGHLILTPS 



WELRGGLWC RSLGYKGVQS 
WEVRVGQWC RSI.GYPGVQA 
WKGHYANVAC AQLGFP . SYV 
WNENYG3EUVAC RDMGYKNNFY 
WTTQISNDVC QLLGLGSG.. 
W C 

EECRIRQWGV R.ACSHDEDA 
EECKIRQWGT R.ACSHSEDA 
VTAIiHHSVYV REGCASGHW 
YKKLYHS... .DACSSKAW 

QQ CLQDSLX 

C 



BovEntk GVTCT 

MacSR GVTCT 

TADG12 TLQCT 

Tmprs82 SLRCL 

HumEntk RLQC . 

Cons C 



(SEQ XD NO. 15) 

( SEQ ZD NO . 16) 

(SEQ ID NO. 17) 

(SEQ ID NO. 18) 

(SEQ ID NO. 19) 
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ProM 
Tryl 
Kal 
TAD612 
Tn^rss2 
Heps 
Cons 



LWVLTAAHCK 
QWWSAGHCY 
QWVLTAAHCF 
LWIITAAHCV 
EWIVTAAHCV 
DWVLTAAHCF 
W A HC 



KPm. 

KSRI 

D . 6LPLQDVH 
. YDLYLPKSW 
EKPI^PWHW 
PERNRVLSRW 



QVFL6KHKLR 
QVRL6EHNIE 
RIYS6ILNLS 
TIQVGLV . . S 
TAFA6ILRQS 
RVFA6AVAQA 
6 



QRESSQEQSS 
VLEGNEQFIK 
DITKDTPFSQ 
LLDNPAPSHL 
EMFYGA . GYQ 
SPHGLQLG. . 



WRAVIHPDY 
AAKIXRHPQY 
IKEIIIHQNY 
VEKIVYHSKY 
VQKVISHPNY 
VQAWYHGGY 
H Y 



ProM DAAS HDQDIMLLRI* 

Tryl DRKT I.HNDIMLIKI< 

Kal KVSE 6NHDIALIKL 

TADG12 KPKR LGNDlAI^MKli 

Tmprss2 DSKT KKHDIAIiMKI^ 

Heps I.PFRDPNSEE NSNDIALVHIi 

Cons DX L I» 



ARPAKIiSEIil 
SSRAVINARV 
QAPLNYTEFQ 
AGPLTFNEMI 
QKPLTFNDLV 
SSPLPLTEYI 



QPLPLERDCS 
STISIiPTAPP 
KPICIiPSKGD 
QPVCIiPNSEE 
KPVCLPNPGM 
QPVCLPAAGQ 



ANT . . TSCHI 
ATG . , TKCLI 
TSTIYTNCWV 
NFPDGKVCWT 
MLQPEQLCWI 
ALVDGKICTV 
C 



ProM 
Tryl 
Kal 
TADG12 
Tmprss2 
Heps 
Cons 



LGWGKTAD . , 
S6WGNTASSG 
TGWGFSKEK . 
SGWGAT . EDG 
S6WGAT . EEK 
TGWGNT . QYY 
GWG 



GDFPDTIQCA 
ADYPDELQCL 
GEIQNILQKV 
GDASPVItHHA 
GKTSEVLNAA 
GQQAGVIiQEA 



YIHLVSREEC 
DAPVLSQAKC 
NIPLVTNEEC 
AVPLISNKIC 
KVLIiIETQRC 
RVPIISNDVC 
C 



EHA. .YPGQI 
EAS . . YPGKI 
QKR . YQDYKI 
NHRDVYGGII 
NSRYVYDNIil 
NGADFYGKQI 
I 



TQNMLCAGDE 
TSNMFCVGFL 
TQRMVCAGYK 
SPSMLCAGYIi 
TPAMICAGFIi 
KPKMFCAGYP 
M C G 



ProM 
Tryl 
Kal 
TADG12 
Tniprss2 
Heps 
Cons 



KYGKDSCQGD 
E6GKDSCQGD 
EGGKDACKGD 
TGGVDSCQGD 
QGNVDSCQGD 
EGGIDACQGD 
D C GD 



SGGPLVC 
SGGPWC 
SGGPLVC 
SGGPLVC 
SGGPLVT 
SG6PFVC 
SGGP V 



(SEQ 


ID 


NO. 


20 ) 


(SEQ 


ID 


NO* 


21) 


(SEQ 


ID 


NO. 


22 ) 


(SEQ 


ID 


NO. 


23 ) 


(SEQ 


ID 


NO. 


24) 


(SEQ 


ID 


NO. 


25 ) 
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TADG-12 
B-Tubulin 



B-Tubulin 
TADG-12V 
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FIG. 7A 




FIG. 7B 
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SEQUENCE LISTING 



<110> O'Brien, Timothy J, 

Underwood, Lowell J. 
<120> Transmembrane Serine Protease Overexpressed 

in Ovarian Carcinoma and Uses Thereof 
<130> D6192PCT 
<141> 2000-03-02 
<150> 09/261,416 
<151> 1999-03-03 
<160> 153 



<210> 1 

<211> 2413 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<223> entire cDNA sequence of TADG-12 gene 

<400> 1 



cgggaaaggg ctgtgtttat gggaagccag taacactgtg gcctactatc 50 
tcttccgtgg tgccatctac atttttggga ctcgggaatt atgaggtaga 100 
ggtggaggcg gagccggatg tcagaggtcc tgaaatagtc accatggggg 150 
aaaatgatcc gcctgctgtt gaagccccct tctcattccg atcgcttttt 200 
ggccttgatg atttgaaaat aagtcctgtt gcaccagatg cagatgctgt 250 
tgctgcacag atcctgtcac tgctgccatt tgaagttttt tcccaatcat 300 
cgtcattggg gatcattgca ttgatattag cactggccat tggtctgggc 350 
atccacttcg actgctcagg gaagtacaga tgtcgctcat cctttaagtg 400 
tatcgagctg ataactcgat gtgacggagt ctcggattgc aaagacgggg 450 
aggacgagta ccgctgtgtc cgggtgggtg gtcagaatgc cgtgctccag 500 
gtgttcacag ctgcttcgtg gaagaccatg tgctccgatg actggaaggg 550 
tcactacgca aatgttgcct gtgcccaact gggtttccca agctatgtga 600 
gttcagataa cctcagagtg agctcgctgg aggggcagtt ccgggaggag 650 
tttgtgtcca tcgatcacct cttgccagat gacaaggtga ctgcattaca 700 
ccactcagta tatgtgaggg agggatgtgc ctctggccac gtggttacct 750 
tgcagtgcac agcctgtggt catagaaggg gctacagctc acgcatcgtg 800 
ggtggaaaca tgtccttgct ctcgcagtgg ccctggcagg ccagccttca 850 
gttccagggc taccacctgt gcgggggctc tgtcatcacg cccctgtgga 900 
tcatcactgc tgcacactgt gtttatgact tgtacctccc caagtcatgg 950 
accatccagg tgggtctagt ttccctgttg gacaatccag ccccatccca 1000 
cttggtggag aagattgtct accacagcaa gtacaagcca aagaggctgg 1050 
gcaatgacat cgcccttatg aagctggccg ggccactcac gttcaatgaa 1100 
atgatccagc ctgtgtgcct gcccaactct gaagagaact tccccgatgg 1150 
aaaagtgtgc tggacgtcag gatggggggc cacagaggat ggaggtgacg 1200 
cctcccctgt cctgaaccac gcggccgtcc ctttgatttc caacaagatc 1250 
tgcaaccaca gggacgtgta cggtggcatc atctccccct ccatgctctg 13 00 
cgcgggctac ctgacgggtg gcgtgaacag ctgccagggg gacagcgggg 1350 
ggcccctggt gtgtcaagag aggaggctgt ggaagttagt gggagcgacc 1400 
agctttggca tcggctgcgc agaggtgaac aagcctgggg tgtacacccg 1450 
tgtcacctcc ttcctggact ggatccacga gcagatggag agagacctaa 1500 
aaacctgaag aggaagggga caagtagcca cctgagttcc tgaggtgatg 1550 
aagacagccc gatcctcccc tggactcccg tgtaggaacc tgcacacgag 1600 
cagacaccct tggagctctg agttccggca ccagtagcgg gcccgaaaga 1650 
ggcacccttc catctgattc cagcacaacc ttcaagctgc tttttgtttt 1700 
ttgttttttt gaggtggagt ctcgctctgt tgcccaggct ggagtgcagt 1750 



SEQ 1/41 
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ggcgaaatac cctgctcact gcagcctccg cttccctggt tcaagcgatt 1800 
ctcttgcctc agcttcccca gtagctggga ccacaggtgc ccgccaccac 1850 
acccaactaa tttttgtatt tttagtagag acagggtttc accatgttgg 1900 
ccaggctgct ctcaaacccc tgacctcaaa tgatgtgcct gcttcagcct 1950 
cccacagtgc tgggattaca ggcatgggcc accacgccta gcctcacgct 2000 
cctttctgat cttcactaag aacaaaagaa gcagcaactt gcaagggcgg 2050 
cctttcccac tggtccatct ggttttctct ccagggtctt gcaaaattcc 2100 
tgacgagata agcagttatg tgacctcacg tgcaaagcca ccaacagcca 2150 
ctcagaaaag acgcaccagc ccagaagtgc agaactgcag tcactgcacg 2200 
ttttcatctt tagggaccag aaccaaaccc accctttcta cttccaagac 2250 
ttattttcac atgtggggag gttaatctag gaatgactcg tttaaggcct 23 00 
attttcatga tttctttgta gcatttggtg cttgacgtat tattgtcctt 2350 
tgattccaaa taatatgttt ccttccctca aaaaaaaaaa aaaaaaaaaa 2400 
aaaaaaaaaa aaa 2413 

<210> 2 

<211> 454 

<212> PRT 

<213> Homo sapiens 

<220> 

<223> complete amino acid sequence of TADG-12 

protein 

<400> 2 



Met 


Gly 


Glu 


Asn 


Asp 
5 


Pro 


Pro 


Ala 


Val 


Glu 
10 


Ala 


Pro 


Phe 


Ser 


Phe 
15 


Arg 


Ser 


Leu 


Phe 


Gly 
20 


Leu 


Asp 


Asp 


Leu 


Lys 
25 


He 


Ser 


Pro 


Val 


Ala 
30 


Pro 


Asp 


Ala 


Asp 


Ala 
35 


Val 


Ala 


Ala 


Gin 


He 
40 


Leu 


Ser 


Leu 


Leu 


Pro 
45 


Phe 


Glu 


Val 


Phe 


Ser 
50 


Gin 


Ser 


Ser 


Ser 


Leu 
55 


Gly 


He 


He 


Ala 


Leu 
60 


lie 


Leu 


Ala 


Leu 


Ala 
65 


lie 


Gly 


Leu 


Gly 


He 
70 


His 


Phe 


Asp 


Cys 


Ser 
75 


Gly 


Lys 


Tyr 


Arg 


Cys 
80 


Arg 


Ser 


Ser 


Phe 


Lys 
85 


Cys 


He 


Glu 


Leu 


He 
90 


Thr 


Arg 


Cys 


Asp 


Gly 
95 


Val 


Ser 


Asp 


Cys 


Lys 
100 


Asp 


Gly 


Glu 


Asp 


Glu 
105 


Tyr 


Arg 


Cys 


Val 


Arg 
110 


Val 


Gly 


Gly 


Gin 


Asn 
115 


Ala 


Val 


Leu 


Gin 


Val 
120 


Phe 


Thr 


Ala 


Ala 


Ser 
125 


Trp 


Lys 


Thr 


Met 


Cys 
130 


Ser 


Asp 


Asp 


Trp 


Lys 
135 


Gly 


His 


Tyr 


Ala 


Asn 
140 


Val 


Ala 


Cys 


Ala 


Gin 
145 


Leu 


Gly 


Phe 


Pro 


Ser 
150 


Tyr 


Val 


Ser 


Ser 


Asp 
155 


Asn 


Leu 


Arg 


Val 


Ser 
160 


Ser 


Leu 


Glu 


Gly 


Gin 
165 


Phe 


Arg 


Glu 


Glu 


Phe 
170 


Val 


Ser 


He 


Asp 


His 
175 


Leu 


Leu 


Pro 


Asp 


Asp 
180 


Lys 


Val 


Thr 


Ala 


Leu 
185 


His 


His 


Ser 


Val 


Tyr 
190 


Val 


Arg 


Glu 


Gly 


Cys 
195 


Ala 


Ser 


Gly 


His 


Val 
200 


Val 


Thr 


Leu 


Gin 


Cys 
205 


Thr 


Ala 


Cys 


Gly 


His 
210 


Arg 


Arg 


Gly 


Tyr 


Ser 
215 


Ser 


Arg 


He 


Val 


Gly 
220 


Gly 


Asn 


Met 


Ser 


Leu 
225 


Leu 


Ser 


Gin 


Trp 


Pro 
230 


Trp 


Gin 


Ala 


Ser 


Leu 
235 


Gin 


Phe 


Gin 


Gly 


Tyr 
240 
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HIS 




ijys 


Giy 


Vaiy 

245 


oer 


vai 


lie 


Tnr 


Fro 
250 


Leu 


irp 


lie 


lie 


255 


TV T 

Ala 


Aia 


rlxS 


\jys 


vai 
260 


Tyr 


ASp 


Leu 


lyr 


lieu 
265 


Fro 


ijys 


ber 


irp 


270 


xxe 




vax 


Lxi.y 


i-jeu 
275 


va± 


oer 


ijeu 


lieu 


ASp 

280 


Asn 


Fro 


Aia 


Fro 


ber 
285 


T T -I 

His 




va± 


IjXU 


j-iys 
290 


xxe 


vai 


lyr 


llXS 


oer 
295 


i-iys 


lyr 


liys 


rf"^ 

Fro 


LiyS 

300 


Arg 




Gly 




ASp 

305 


lie 


Ala 


Leu 


Heu 


Lys 
310 


Leu 


Ala 


Gly 


Fro 


Lieu 
315 






Asn 


LjXU 


ne c 
320 


lie 


Gin 


Fro 


vai 


325 


Leu 


Fro 


Asn 


ber 


GIU 
330 


Glu 


Asn 




"W* 

fro 


ASp 

335 


Giy 


Lys 


vai 


Cys 


m ^^^^ 

i rp 
340 


inr 


ber 


Gly 


Trp 


Giy 

345 


Ala 




Glu 


ASp 


(jyiy 

350 


Gly 


Asp 


Ala 


oer 


Fro 
355 


val 


Leu 


Asn 


TT — 

His 


A 1 ^ 

Ala 
360 


Ala 


vai 


T~5 -w- 


Lieu 


± ±e 
365 


oer 


Asn 


Liys 


i±e 


vjys 
370 


Asn 


rll S 


Arg 


ASp 


T7=a 1 

va± 
375 


Tyr 


Cjiy 


Gly 


lie 


lie 
380 


oer 


Fro 


oer 


Mec 


Leu 
385 


uys 


A T — i 

Ala 


Giy 


lyr 


390 


Tfir 


Gly 


Gly 


vai 


ASp 

395 


ber 


Gys 


Gin 


Gly 


ASp 

400 


ber 


Gly 


Gly 


Fro 


LiSU 

405 


Val 


Cys 


Gin 


Glu 


Arg 

^ i u 


Arg 


Leu 


Trp 


Lys 


Leu 

/LI R 


Val 


Gly 


Ala 


Tnr 


Ser 


Phe 


Gly 


lie 


Gly 


Cys 
425 


Ala 


Glu 


Val 


Asn 


Lys 
430 


Pro 


Gly 


Val 


Tyr 


Thr 
435 


Arg 


Val 


Thr 


Ser 


Phe 
440 


Leu 


Asp 


Trp 


lie 


His 
445 


Glu 


Gin 


Met 


Glu 


Arg 
450 


Asp 


Leu 


Lys 


Thr 

























<210> 3 

<211> 2544 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<223> entire cDNA sequence of TADG-12 variant gene 

<400> 3 



cgggaaaggg ctgtgtttat gggaagccag taacactgtg gcctactatc 50 

tcttccgtgg tgccatctac atttttggga ctcgggaatt atgaggtaga 100 

ggtggaggcg gagccggatg tcagaggtcc tgaaatagtc accatggggg 150 

aaaatgatcc gcctgctgtt gaagccccct tctcattccg atcgcttttt 200 

ggccttgatg atttgaaaat aagtcctgtt gcaccagatg cagatgctgt 250 

tgctgcacag atcctgtcac tgctgccatt tgaagttttt tcccaatcat 3 00 

cgtcattggg gatcattgca ttgatattag cactggccat tggtctgggc 3 50 

atccacttcg actgctcagg gaagtacaga tgtcgctcat cctttaagtg 400 

tatcgagctg ataactcgat gtgacggagt ctcggattgc aaagacgggg 450 

aggacgagta ccgctgtgtc cgggtgggtg gtcagaatgc cgtgctccag 500 

gtgttcacag ctgcttcgtg gaagaccatg tgctccgatg actggaaggg 550 

tcactacgca aatgttgcct gtgcccaact gggtttccca agctatgtaa 600 

gttcagataa cctcagagtg agctcgctgg aggggcagtt ccgggaggag 65 0 

tttgtgtcca tcgatcacct cttgccagat gacaaggtga ctgcattaca 700 

ccactcagta tatgtgaggg agggatgtgc ctctggccac gtggttacct 750 

tgcagtgcac agcctgtggt catagaaggg gctacagctc acgcatcgtg 800 
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ggtggaaaca tgtccttgct ctcgcagtgg ccctggcagg ccagccttca 850 

gttccagggc taccacctgt gcgggggctc tgtcatcacg cccctgtgga 900 

tcatcactgc tgcacactgt gtttatgaga ttgtagctcc tagagaaagg 950 

gcagacagaa gaggaaggaa gctcctgtgc tggaggaaac ccacaaaaat 1000 

gaaaggacct agaccttccc atagctaatt ccagtggacc atgttatggc 1050 

agatacaggc ttgtacctcc ccaagtcatg gaccatccag gtgggtctag 1100 

tttccctgtt ggacaatcca gccccatccc acttggtgga gaagattgtc 1150 

taccacagca agtacaagcc aaagaggctg ggcaatgaca tcgcccttat 12 00 

gaagctggcc gggccactca cgttcaatga aatgatccag cctgtgtgcc 1250 

tgcccaactc tgaagagaac ttccccgatg gaaaagtgtg ctggacgtca 13 00 

ggatgggggg ccacagagga tggaggtgac gcctcccctg tcctgaacca 1350 

cgcggccgtc cctttgattt ccaacaagat ctgcaaccac agggacgtgt 1400 

acggtggcat catctccccc tccatgctct gcgcgggcta cctgacgggt 1450 

ggcgtggaca gctgccaggg ggacagcggg gggcccctgg tgtgtcaaga 1500 

gaggaggctg tggaagttag tgggagcgac cagctttggc atcggctgcg 155 0 

cagaggtgaa caagcctggg gtgtacaccc gtgtcacctc cttcctggac 1600 

tggatccacg agcagatgga gagagaccta aaaacctgaa gaggaagggg 1650 

acaagtagcc acctgagttc ctgaggtgat gaagacagcc cgatcctccc 17 00 

ctggactccc gtgtaggaac ctgcacacga gcagacaccc ttggagctct 1750 

gagttccggc accagtagcg ggcccgaaag aggcaccctt ccatctgatt 1800 

ccagcacaac cttcaagctg ctttttgttt tttgtttttt tgaggtggag 1850 

tctcgctctg ttgcccaggc tggagtgcag tggcgaaata ccctgctcac 1900 

tgcagcctcc gcttccctgg ttcaagcgat tctcttgcct cagcttcccc 1950 

agtagctggg accacaggtg cccgccacca cacccaacta atttttgtat 2000 

ttttagtaga gacagggttt caccatgttg gccaggctgc tctcaaaccc 2050 

ctgacctcaa atgatgtgcc tgcttcagcc tcccacagtg ctgggattac 2100 

aggcatgggc caccacgcct agcctcacgc tcctttctga tcttcactaa 2150 

gaacaaaaga agcagcaact tgcaagggcg gcctttccca ctggtccatc 2200 

tggttttctc tccagggtct tgcaaaattc ctgacgagat aagcagttat 22 50 

gtgacctcac gtgcaaagcc accaacagcc actcagaaaa gacgcaccag 23 00 

cccagaagtg cagaactgca gtcactgcac gttttcatct ttagggacca 2350 

gaaccaaacc caccctttct acttccaaga cttattttca catgtgggga 2400 

ggttaatcta ggaatgactc gtttaaggcc tattttcatg atttctttgt 2450 

agcatttggt gcttgacgta ttattgtcct ttgattccaa ataatatgtt 2500 

tccttccctc aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaa 2 544 



<210> 4 

<211> 294 

<212> PRT 

<213> Homo sapiens 

<220> 

<223> complete amino acid sequence of TADG-12 

variant protein 

<400> 4 



Met 


Gly 


Glu 


Asn 


Asp 


Pro 


Pro 


Ala 


Val 


Glu 


Ala 


Pro 


Phe 


Ser 


Phe 








5 










10 










15 


Arg 


Ser 


Leu 


Phe 


Gly 


Leu 


Asp 


Asp 


Leu 


Lys 


He 


Ser 


Pro 


Val 


Ala 








20 










25 










30 


Pro 


Asp 


Ala 


Asp 


Ala 


Val 


Ala 


Ala 


Gin 


He 


Leu 


Ser 


Leu 


Leu 


Pro 






35 










40 










45 


Phe 


Glu 


Val 


Phe 


Ser 


Gin 


Ser 


Ser 


Ser 


Leu 


Gly 


He 


He 


Ala 


Leu 










50 










55 










60 


lie 


Leu 


Ala 


Leu 


Ala 


He 


Gly 


Leu 


Gly 


He 


His 


Phe 


Asp 


Cys 


Ser 










65 










70 










75 


Gly 


Lys 


Tyr 


Arg 


Cys 


Arg 


Ser 


Ser 


Phe 


Lys 


Cys 


He 


Glu 


Leu 


He 



SEQ 4/41 



wo 00/52044 



PCTAJSOO/05612 



80 85 90 

Thr Arg Cys Asp Gly Val Ser Asp Cys Lys Asp Gly Glu Asp Glu 

95 100 105 

Tyr Arg Cys Val Arg Val Gly Gly Gin Asn Ala Val Leu Gin Val 

110 115 120 

Phe Thr Ala Ala Ser Trp Lys Thr Met Cys Ser Asp Asp Trp Lys 

125 130 135 

Gly His Tyr Ala Asn Val Ala Cys Ala Gin Leu Gly Phe Pro Ser 

140 145 150 

Tyr Val Ser Ser Asp Asn Leu Arg Val Ser Ser Leu Glu Gly Gin 

155 160 165 

Phe Arg Glu Glu Phe Val Ser lie Asp His Leu Leu Pro Asp Asp 

170 175 180 

Lys Val Thr Ala Leu His His Ser Val Tyr Val Arg Glu Gly Cys 

185 190 195 

Ala Ser Gly His Val Val Thr Leu Gin Cys Thr Ala Cys Gly His 

200 205 210 

Arg Arg Gly Tyr Ser Ser Arg lie Val Gly Gly Asn Met Ser Leu 

215 220 225 

Leu Ser Gin Trp Pro Trp Gin Ala Ser Leu Gin Phe Gin Gly Tyr 

230 235 240 

His Leu Cys Gly Gly Ser Val lie Thr Pro Leu Trp lie lie Thr 

245 250 255 

Ala Ala His Cys Val Tyr Glu lie Val Ala Pro Arg Glu Arg Ala 

260 265 270 

Asp Arg Arg Gly Arg Lys Leu Leu Cys Trp Arg Lys Pro Thr Lys 

275 280 285 

Met Lys Gly Pro Arg Pro Ser His Ser 

290 

<210> 5 
<211> 174 
<212> DNA 

<213> Artificial sequence 

<220> 

<223> nucleotide sequence of the subclone containing 

the 180 bp band from the PCR product for TADG-12 
<400> 5 

tgggtggtga cggcggcgca ctgtgtttat gacttgtacc tccccaagtc 50 
atggaccatc caggtgggtc tagtttccct gttggacaat ccagccccat 100 
cccacttggt ggagaagatt gtctaccaca gcaagtacaa gccaaagagg 150 
ctgggcaacg acatcgccct ccta 174 

<210> 6 
<211> 58 
<212> PRT 

<213> Artificial sequence 

<220> 

<223> deduced amino acid sequence of the 180 bp band 

from the PCR product for TADG-12 
<400> 6 

Trp Val Val Thr Ala Ala His Cys Val Tyr Asp Leu Tyr Leu Pro 

5 10 15 

Lys Ser Trp Thr lie Gin Val Gly Leu Val Ser Leu Leu Asp Asn 
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20 

Pro Ala Pro Ser His 

35 

Tyr Lys Pro Lys Arg 

50 



25 

Leu Val Glu Lys lie 

40 

Leu Gly Asn Asp lie 

55 



30 

Val Tyr His Ser Lys 

45 

Ala Leu Leu 



<210> 7 
<211> 328 
<212> DNA 

<213> Artificial sequence 

<220> 

<223> nucleotide sequence of the subclone containing 

the 3 00 bp band from the PGR product for 
TADG-12 variant, which contains an additional 
insert of 133 bases 

<400> 7 



gggtggtgac ggcggcgcac tgtgtttatg agattgtagc tcctagagaa 50 

agggcagaca gaagaggaag gaagctcctg tgctggagga aacccacaaa 100 

aatgaaagga cctagacctt cccatagcta attccagtgg accatgttat 150 

ggcagataca ggcttgtacc tccccaagtc atggaccatc caggtgggtc 200 

tagtttccct gttggacaat ccagccccat cccacttggt ggagaagatt 250 

gtctaccaca gcaagtacaa gccaaagagg ctgggcaacg acatcgccct 3 00 

cctaatcact agtgcggccg cctgcagg 328 



<210> 8 
<211> 42 
<212> PRT 

<213> Artificial sequence 

<220> 

<223> deduced amino acid sequence of the 3 00 bp band 

from the PGR product for TADG-12 variant, which is 
a truncated form of TADG-12 

<400> 8 



Val Val Thr 
Glu Arg Ala 
Pro Thr Lys 



Ala Ala His 
5 

Asp Arg Arg 
20 

Met Lys Gly 
35 



Cys Val Tyr 
Gly Arg Lys 
Pro Arg Pro 



Glu lie Val 
10 

Leu Leu Cys 
25 

Ser His Ser 
40 



Ala Pro Arg 

15 

Trp Arg Lys 
30 



<210> 9 

<211> 34 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the complement subunit C8 

(Compc8) 

<400> 9 



Cys Glu Gly Phe Val Cys Ala Gin Thr Gly Arg Cys Val Asn Arg 

5 10 15 

Arg Leu Leu Cys Asn Gly Asp Asn Asp Cys Gly Asp Gin Ser Asp 

20 25 30 



SEQ 6/41 



wo 00/52044 



PCT/USOO/05612 



Glu Ala Asn Cys 



<210> 10 

<211> 34 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the serine protease 

matriptase (Matr) 
<400> 10 

Cys Pro Gly Gin Phe Thr Cys Arg Thr Gly Arg Cys lie Arg Lys 

5 10 15 

Glu Leu Arg Cys Asp Gly Trp Ala Asp Cys Thr Asp His Ser Asp 

20 25 30 

Glu Leu Asn Cys 



<210> 11 

<211> 37 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the glycoprotein GP3 00 

(Gp300-1) 
<400> 11 

Cys Gin Gin Gly Tyr Phe Lys Cys Gin Ser Glu Gly Gin Cys lie 

5 10 15 

Pro Ser Ser Trp Val Cys Asp Gin Asp Gin Asp Cys Asp Asp Gly 

20 25 30 

Ser Asp Glu Arg Gin Asp Cys 

35 

<210> 12 

<211> 35 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the glycoprotein GP300 

(Gp300-2) 
<400> 12 

Cys Ser Ser His Gin lie Thr Cys Ser Asn Gly Gin Cys lie Pro 

5 10 15 

Ser Glu Tyr Arg Cys Asp His Val Arg Asp Cys Pro Asp Gly Ala 

20 25 30 

Asp Glu Asn Asp Cys 

35 

<210> 13 
<211> 35 



SEQ 7/41 



wo 00/52044 



PCT/USOO/05612 



<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<222> 74 . . . 108 

<223> LDLR-A domain of TADG-12 

<400> 13 



Cys Ser Gly Lys Tyr Arg Cys Arg Ser Ser Phe Lys Cys lie Glu 

5 10 15 

Leu lie Thr Arg Cys Asp Gly Val Ser Asp Cys Lys Asp Gly Glu 

20 25 30 

Asp Glu Tyr Arg Cys 

35 



<210> 14 

<211> 36 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the serine protease TMPRSS2 

Tmprss2 

<400> 14 



Cys Ser Asn 
Asn Pro Ser 
Glu Asp Glu 



Ser Gly lie 
5 

Asn Trp Cys 
20 

Asn Arg Cys 
35 



Glu Cys Asp 
Asp Gly Val 



Ser Ser Gly 
10 

Ser His Cys 
25 



Thr Cys lie 

15 

Pro Gly Gly 
30 



<210> 15 

<211> 101 

<212> PRT 

< 2 1 3 > Bos taurus 

<220> 

<221> DOMAIN 

<223> SRCR domain of bovine enterokinase (BovEntk) 

<400> 15 



Val 


Arg 


Leu 


Val 


Gly 
5 


Gly 


Ser 


Gly 


lie 


Phe 


His 


Glu 


Gly 


Gin 


Trp 


Gly 










20 








Glu 


Leu 


Arg 


Gly 


Gly 


Leu 


Val 


Val 










35 








Gly 


Val 


Gin 


Ser 


Val 


His 


Lys 


Arg 










50 








Gly 


Pro 


lie 


Trp 


Leu 


Asn 


Glu 


Val 










65 








Ser 


lie 


Glu 


Glu 


Cys 


Arg 


lie 


Arg 










80 








Ser 


His 


Asp 


Glu 


Asp 


Ala 


Gly Val 










95 









Pro 


His 
10 


Glu 


Gly 


Arg 


Val 


Glu 
15 


Thr 


Val 
25 


Cys 


Asp 


Asp 


Arg 


Trp 
30 


Cys 


Arg 
40 


Ser 


Leu 


Gly 


Tyr 


Lys 
45 


Ala 


Tyr 
55 


Phe 


Gly 


Lys 


Gly 


Thr 
60 


Phe 


Cys 
70 


Phe 


Gly 


Lys 


Glu 


Ser 
75 


Gin 


Trp 
85 


Gly 


Val 


Arg 


Ala 


Cys 
90 


Thr 


Cys 
100 


Thr 











SEQ 8/41 



wo 00/52044 



PCT/USOO/05612 



<210> 16 

<211> 101 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> SRCR domain of human macrophage scavenger 

receptor (MacSR) 

<400> 16 



Val 


Arg 


Leu 


Val 


Gly 
5 


Gly 


Ser 


Gly 


Pro 


His 
10 


Glu 


Gly Arg 


Val 


Glu 
15 


He 


Leu 


His 


Ser 


Gly 
20 


Gin 


Trp 


Gly 


Thr 


He 
25 


Cys 


Asp Asp 


Arg 


Trp 
30 


Glu 


Val 


Arg 


Val 


Gly 
35 


Gin 


Val 


Val 


Cys 


Arg 
40 


Ser 


Leu Gly 


Tyr 


Pro 
45 


Gly 


Val 


Gin 


Ala 


Val 
50 


His 


Lys 


Ala 


Ala 


His 
55 


Phe 


Gly Gin 


Gly 


Thr 
60 


Gly 


Pro 


He 


Trp 


Leu 
65 


Asn 


Glu 


Val 


Phe 


Cys 
70 


Phe 


Gly Arg 


Glu 


Ser 
75 


Ser 


He 


Glu 


Glu 


Cys 
80 


Lys 


He 


Arg 


Gin 


Trp 
85 


Gly 


Thr Arg 


Ala 


Cys 
90 


Ser 


His 


Ser 


Glu 


Asp 
95 


Ala 


Gly 


Val 


Thr 


Cys 
100 


Thr 









<210> 17 

<211> 98 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<222> 109 . . .206 

<223> SRCR domain of TADG-12 {TADG12) 

<400> 17 



Val 


Arg 


Val 


Gly 


Gly 


Gin 


Asn 


Ala 


Val 


Leu 


Gin 


Val 


Phe 


Thr 


Ala 










5 










10 










15 


Ala 


Ser 


Trp 


Lys 


Thr 


Met 


Cys 


Ser 


Asp 


Asp 


Trp 


Lys 


Gly 


His 


Tyr 










20 










25 










30 


Ala 


Asn 


Val 


Ala 


Cys 


Ala 


Gin 


Leu 


Gly 


Phe 


Pro 


Ser 


Tyr 


Val 


Ser 










35 










40 










45 


Ser 


Asp 


Asn 


Leu 


Arg 


Val 


Ser 


Ser 


Leu 


Glu 


Gly 


Gin 


Phe 


Arg 


Glu 










50 










55 










60 


Glu 


Phe 


Val 


Ser 


He 


Asp 


His 


Leu 


Leu 


Pro 


Asp 


Asp 


Lys 


Val 


Thr 










65 










70 










75 


Ala 


Leu 


His 


His 


Ser 


Val 


Tyr 


Val 


Arg 


Glu 


Gly 


Cys 


Ala 


Ser 


Gly 










80 










85 










90 


His 


Val 


Val 


Thr 


Leu 


Gin 


Cys 


Thr 

















95 



<210> 18 

<211> 94 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 



SEQ 9/41 



wo 00/52044 



PCT/USOO/05612 



<223> SRCR domain of the serine protease TMPRSS2 

{Tmprss2) 
<400> 18 



vax 


Arg 


Lieu 


Tyr 


\j±y 


JriO 




irlie 


X xe 


j-ieu 


vjixn 


jYie L. 


lyr 


O ^ 

oer 


oer 










c; 
•J 










1 0 












Gin 


Arg 


Lys 


Ser 


Trp 


His 


Pro 


Val 


Cys 


Gin 


Asp 


Asp 


Trp 


Asn 


Glu 










20 










25 










30 


Asn 


Tyr 


Gly 


Arg 


Ala 


Ala 


Cys 


Arg 


Asp 


Met 


Gly 


Tyr 


Lys 


Asn 


Asn 










35 










40 










45 


Phe 


Tyr 


Ser 


Ser 


Gin 


Gly 


lie 


Val 


Asp 


Asp 


Ser 


Gly 


Ser 


Thr 


Ser 










50 










55 










60 


Phe 


Met 


Lys 


Leu 


Asn 


Thr 


Ser 


Ala 


Gly 


Asn 


Val 


Asp 


lie 


Tyr 


Lys 










65 










70 










75 


Lys 


Leu 


Tyr 


His 


Ser 


Asp 


Ala 


Cys 


Ser 


Ser 


Lys 


Ala 


Val 


Val 


Ser 



80 85 90 

Leu Arg Cys Leu 



<210> 19 

<211> 90 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> SRCR domain of human enterokinase (HumEntk) 

<400> 19 



Val 


Arg 


Phe 


Phe 


Asn 


Gly 


Thr 


Thr 


Asn 


Asn 


Asn 


Gly 


Leu 


Val 


Arg 










5 










10 










15 


Phe 


Arg 


lie 


Gin 


Ser 


He 


Trp 


His 


Thr 


Ala 


Cys 


Ala 


Glu 


Asn 


Trp 










20 










25 










30 


Thr 


Thr 


Gin 


lie 


Ser 


Asn 


Asp 


Val 


Cys 


Gin 


Leu 


Leu 


Gly 


Leu 


Gly 










35 










40 










45 


Ser 


Gly 


Asn 


Ser 


Ser 


Lys 


Pro 


He 


Phe 


Ser 


Thr 


Asp 


Gly 


Gly 


Pro 










50 










55 










60 


Phe 


Val 


Lys 


Leu 


Asn 


Thr 


Ala 


Pro 


Asp 


Gly 


His 


Leu 


He 


Leu 


Thr 










65 










70 










75 


Pro 


Ser 


Gin 


Gin 


Cys 


Leu 


Gin 


Asp 


Ser 


Leu 


He 


Arg 


Leu 


Gin 


Cys 










80 










85 










90 




<210> 




20 
























<211> 




149 
























<212> 




PRT 
























<213> 




Homo 


sapiens 




















<220> 




























<221> 




DOMAIN 






















<223> 




protease 


domain 


of protease 


M (ProM) 








<400> 




20 






















Leu 


Trp 


Val 


Leu 


Thr 


Ala 


Ala 


His 


Cys 


Lys 


Lys 


Pro 


Asn 


Leu 


Gin 










5 










10 










15 


Val 


Phe 


Leu 


Gly 


Lys 


His 


Asn 


Leu 


Arg 


Gin 


Arg 


Glu 


Ser 


Ser 


Gin 










20 










25 










30 


Glu 


Gin 


Ser 


Ser 


Val 


Val 


Arg 


Ala 


Val 


He 


His 


Pro 


Asp 


Tyr 


Asp 










35 










40 










45 


Ala 


Ala 


Ser 


His 


Asp 


Gin 


Asp 


He 


Met 


Leu 


Leu 


Arg 


Leu 


Ala 


Arg 



SEQ 10/41 



wo 00/52044 



PCT/USOO/05612 











50 










55 










60 




AT ^3 


J— l_y t3 


X_i \_> 1^ 


65 


Glu 




Tie 

X X 


Gin 


70 


XJ V|B> 


X 


T 1 

J-J l_X 


Glu 


"X y 

75 




^ jr 0 




Ala 


A c^n 

80 




X XXX 


OCX 




His 
85 


lie 


T 1^11 

XJC u 


niv 

vjx 




Glv 

\JX 

90 


T A/c: 






A c;n 


95 




x^xxc? 


X X luf 


A QT^ 


X XXX 

100 


Tie* 

X X w 


Gl n 

wXXX 


0 


AT ;5 


lyx 
105 


xxe 


T T J n 

rllS 






^^^^ 

oer 

110 








<jys 


on n 

vjXU 

115 


rlxS 


J\±ci 


ryr 


-fc^ ^p*-v 


t»xy 
120 


Gin 


lie 


Thr 


Gin 


Asn 
125 


Met 


Leu 


Cys 


Ala 


Gly 
130 


Asp 


Glu 


Lys 


Tyr 


Gly 
135 


Lys 


Asp 


Ser 


Cys 


Gin 
140 


Gly 


Asp 


Ser 


Gly 


Gly 
145 


Pro 


Leu 


Val 


Cys 





<210> 21 

<211> 151 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain of trypsinogen I (Tryl) 

<400> 21 



Gin 


Trp 


Val 


Val 


Ser 
5 


Ala 


Gly 


His 


Cys 


Tyr 
10 


Lys 


Ser 


Arg 


He 


Gin 
15 


Val 


Arg 


Leu 


Gly 


Glu 
20 


His 


Asn 


He 


Glu 


Val 
25 


Leu 


Glu 


Gly 


Asn 


Glu 
30 


Gin 


Phe 


He 


Asn 


Ala 
35 


Ala 


Lys 


He 


He 


Arg 
40 


His 


Pro 


Gin 


Tyr 


Asp 
45 


Arg 


Lys 


Thr 


Leu 


Asn 
50 


Asn 


Asp 


He 


Met 


Leu 
55 


He 


Lys 


Leu 


Ser 


Ser 
60 


Arg 


Ala 


Val 


He 


Asn 
65 


Ala 


Arg 


Val 


Ser 


Thr 
70 


He 


Ser 


Leu 


Pro 


Thr 
75 


Ala 


Pro 


Pro 


Ala 


Thr 
80 


Gly 


Thr 


Lys 


Cys 


Leu 
85 


He 


Ser 


Gly 


Trp 


Gly 
90 


Asn 


Thr 


Ala 


Ser 


Ser 
95 


Gly 


Ala 


Asp 


Tyr 


Pro 
100 


Asp 


Glu 


Leu 


Gin 


Cys 
105 


Leu 


Asp 


Ala 


Pro 


Val 
110 


Leu 


Ser 


Gin 


Ala 


Lys 
115 


Cys 


Glu 


Ala 


Ser 


Tyr 
120 


Pro 


Gly 


Lys 


He 


Thr 
125 


Ser 


Asn 


Met 


Phe 


Cys 
130 


Val 


Gly 


Phe 


Leu 


Glu 
135 


Gly 


Gly 


Lys 


Asp 


Ser 


Cys 


Gin 


Gly Asp 


Ser 


Gly 


Gly 


Pro 


Val 


Val 










140 










145 










150 



Cys 



<210> 22 

<211> 158 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain of plasma kallikrein (Kal) 

<400> 22 

Gin Trp Val Leu Thr Ala Ala His Cys Phe Asp Gly Leu Pro Leu 



SEQ 11/41 



wo 00/52044 

5 



Gin 


Asp 


Val 


Trp 


Arg 


He 


Tyr 


Ser 


lie 


Thr 


Lys 


Asp 


Thr 


Pro 


Phe 


Ser 


His 


Gin 


Asn 


Tyr 


Lys 

3 U 


Val 


Ser 


Glu 


He 


Lys 


Leu 


Gin 


Ala 

D 3 


Pro 


Leu 


Asn 


He 


Cys 


Leu 


Pro 


Ser 


Lys 


Gly 


Asp 


Cys 


Trp 


Val 


Thr 


Gly 


Trp 


Gly 


Phe 




A O T~J 




JLlt3 Li. 


110 


T 

U jr o 


V CIX 




Glu 


Cys 


Gin 


Lys 


Arg 
125 


Tyr 


Gin 


Asp 


Val 


Cys 


Ala 


Gly 


Tyr 
140 


Lys 


Glu 


Gly 


Asp 


Ser 


Gly 


Gly 


Pro 


Leu 


Val 


Cys 



155 



<210> 23 

<211> 157 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain 

<400> 23 



Leu 


Trp 


He 


He 


Thr 
5 


Ala 


Ala 


His 


Pro 


Lys 


Ser 


Trp 


Thr 


He 


Gin 


Val 










20 








Asn 


Pro 


Ala 


Pro 


Ser 


His 


Leu 


Val 










35 








Lys 


Tyr 


Lys 


Pro 


Lys 


Arg 


Leu 


Gly 










50 








Leu 


Ala 


Gly 


Pro 


Leu 


Thr 


Phe 


Asn 










65 








Leu 


Pro 


Asn 


Ser 


Glu 


Glu 


Asn 


Phe 










80 








Thr 


Ser 


Gly 


Trp 


Gly 


Ala 


Thr 


Glu 










95 








Val 


Leu 


Asn 


His 


Ala 


Ala 


Val 


Pro 










110 








Asn 


His 


Arg 


Asp 


Val 


Tyr 


Gly 


Gly 










125 








Cys 


Ala 


Gly 


Tyr 


Leu 


Thr 


Gly 


Gly 










140 








Ser 


Gly 


Gly 


Pro 


Leu 


Val 


Cys 












155 









<210> 24 
<211> 159 



PCT/US00/0S612 



10 15 



VjrXy 


±±e 


jjeu 


7\ d" m 


ijeu 




















C7xn 


X jLe 


j-jys 


VjXU 


X xe 


Tl o 

X X t:^ 


Tl o 
X Xt^ 














*± o 


vji±y 


Asn 


flXS 


riSp 


Tl *a 




T .01 1 














An 
0 u 


r-i_r— 

Tyr 


Tnr 


o T 1 1 
CjXU 


fne 


Lxxn 


Ljys 


xrro 




/ U 












1 nr 


oer 


1 nr 


Tl o 


lyr 


X ILL 
















Q n 


oer 


juys 


oXU 


ijys 


Vjxy 




Tl <=» 
X xc^ 




100 










105 


He 


Pro 


Leu 


Val 


Thr 


Asn 


Glu 




115 










120 


Tyr 


Lys 


He 


Thr 


Gin 


Arg 


Met 




130 










135 


Gly 


Lys 


Asp 


Ala 


Cys 


Lys 


Gly 




145 










150 



of TADG-12 (TADG12) 



Cys 


Val 


Tyr 


Asp 


Leu 


Tyr 


Leu 




10 










15 


Gly 


Leu 


Val 


Ser 


Leu 


Leu 


Asp 




25 










30 


Glu 


Lys 


He 


Val 


Tyr 


His 


Ser 




40 










45 


Asn 


Asp 


He 


Ala 


Leu 


Met 


Lys 




55 










60 


Glu 


Met 


He 


Gin 


Pro 


Val 


Cys 




70 










75 


Pro 


Asp 


Gly 


Lys 


Val 


Cys 


Trp 




85 










90 


Asp 


Gly 


Gly 


Asp 


Ala 


Ser 


Pro 




100 










105 


Leu 


He 


Ser 


Asn 


Lys 


He 


Cys 




115 










120 


He 


He 


Ser 


Pro 


Ser 


Met 


Leu 




130 










135 


Val 


Asp 


Ser 


Cys 


Gin 


Gly 


Asp 




145 










150 



SEQ 12/41 



wo 00/52044 



PCT/USOO/056I2 



<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain of TMPRSS2 (Tmprss2) 

<400> 24 



Glu 


Trp 


He 


Val 


Thr 
5 


Ala 


Ala 


His 


Cys 


Val 


Glu 


Lys 


Pro 


Leu 


Asn 


Asn 


Pro 


Torp 


His 


Trp 
20 


Thr 


Ala 


Phe 


Ala 


Gly 

ZD 


He 


Leu 


Arg 


Gin 


Ser 

J U 


Phe 


Met 


Phe 


Tyr 


Gly Ala 


Gly 


Tyr 


Gin 


Val 


Gin 


Lys 


Val 


He 


Ser 










35 










4U 












His 


Pro 


Asn 


Tyr 


Asp 
50 


Ser 


Lys 


Thr 


Lys 


Asn 


Asn 


Asp 


He 


Ala 


Leu 

O U 


Met 


Lys 


Leu 


Gin 


Lys 
65 


Pro 


Leu 


Thr 


Phe 


Asn 
70 


Asp 


Leu 


Val 


Lys 


Pro 

75 


Val 


Cys 


Leu 


Pro 


Asn 


Pro 


Gly 


Met 


Met 


Leu 


Gin 


Pro 


Glu 


Gin 


Leu 








80 










85 










90 


Cys 


Trp 


He 


Ser 


Gly 
95 


Trp 


Gly 


Ala 


Thr 


Glu 
100 


Glu 


Lys 


Gly 


Lys 


Thr 
105 


Ser 


Glu 


Val 


Leu 


Asn 
110 


Ala 


Ala 


Lys 


Val 


Leu 
115 


Leu 


He 


Glu 


Thr 


Gin 
120 


Arg 


Cys 


Asn 


Ser 


Arg 
125 


Tyr 


Val 


Tyr 


Asp 


Asn 
130 


Leu 


He 


Thr 


Pro 


Ala 
135 


Met 


He 


Cys 


Ala 


Gly 
140 


Phe 


Leu 


Gin 


Gly 


Asn 
145 


Val 


Asp 


Ser 


Cys 


Gin 
150 


Gly 


Asp 


Ser 


Gly 


Gly 
155 


Pro 


Leu 


Val 


Thr 















<210> 


25 


<211> 


164 


<212> 


PRT 


<213> 


Homo Sctpi ens 


<220> 




<221> 


DOMAIN 


<223> 


protease domain of Hepsin (Heps) 


<400> 


25 



Asp 


Trp 


Val 


Leu 


Thr 


Ala 


Ala 


His 


Cys 


Phe 


Pro 


Glu 


Arg 


Asn 


Arg 






5 










10 










15 


Val 


Leu 


Ser 


Arg 


Trp 


Arg 


Val 


Phe 


Ala 


Gly 


Ala 


Val 


Ala 


Gin 


Ala 








20 










25 










30 


Ser 


Pro 


His 


Gly 


Leu 


Gin 


Leu 


Gly Val 


Gin 


Ala 


Val 


Val 


Tyr 


His 








35 










40 










45 


Gly 


Gly 


Tyr 


Leu 


Pro 


Phe 


Arg 


Asp 


Pro 


Asn 


Ser 


Glu 


Glu 


Asn 


Ser 






50 










55 










60 


Asn 


Asp 


He 


Ala 


Leu 


Val 


His 


Leu 


Ser 


Ser 


Pro 


Leu 


Pro 


Leu 


Thr 








65 










70 










75 


Glu 


Tyr 


He 


Gin 


Pro 


Val 


Cys 


Leu 


Pro 


Ala 


Ala 


Gly 


Gin 


Ala 


Leu 








80 










85 










90 


Val 


Asp 


Gly 


Lys 


He 
95 


Cys 


Thr 


Val 


Thr 


Gly 
100 


Trp 


Gly 


Asn 


Thr 


Gin 
105 


Tyr 


Tyr 


Gly 


Gin 


Gin 


Ala 


Gly 


Val 


Leu 


Gin 


Glu 


Ala 


Arg 


Val 


Pro 








110 










115 










120 


He 


He 


Ser 


Asn 


Asp 


Val 


Cys 


Asn 


Gly 


Ala 


Asp 


Phe 


Tyr 


Gly Asn 



SEQ 13/41 



wo 00/52044 



PCT/USOO/05612 



125 130 135 

Gin lie Lys Pro Lys Met Phe Cys Ala Gly Tyr Pro Glu Gly Gly 

140 145 150 

lie Asp Ala Cys Gin Gly Asp Ser Gly Gly Pro Phe Val Cys 

155 160 



<210> 
<211> 
<212> 
<213> 
<220> 
<221> 
<222> 
<223> 



<400> 



26 
23 
DNA 

Artificial sequence 

priiner_bind 

6, 9. 12, 15, 18 

forward redundant primer for the consensus 
sequences of amino acids surrounding the catalytic 
triad for serine proteases, n = inosine 
26 



tgggtngtna cngcngcnca ytg 



23 



<210> 


27 


<211> 


20 


<212> 


DNA 


<213> 


Artificial sequence 


<220> 




<221> 


primer_bind 


<222> 


3, 6, 9, 12, 15, 18 


<223> 


reverse redundant primer 



<400> 



for the consensus 
sequences of amino acids surrounding the catalytic 
triad for serine proteases, n = inosine 
27 



arnarngcna tntcnttncc 



20 



<210> 


28 


<211> 


20 


<212> 


DNA 


<213> 


Artificial sequence 


<220> 




<221> 


primer_bind 


<223> 


forward oligonucleotide primer 
used for quantitative PCR 


<400> 


28 



gaaacatgtc cttgctctcg 



20 



<210> 
<211> 
<212> 
<213> 
<220> 
<221> 
<223> 

<400> 



29 
20 
DNA 

Artificial sequence 
primer_bind 

reverse oligonucleotide primer for TADG-12 

used for quantitative PCR 

29 



SEQ 14/41 



wo 00/52044 



PCT/USOO/05612 



actaacttcc acagcctcct 20 



<^ 1U> 


J U 


<211> 


20 


<212> 


DNA 


<213> 


Artificial sequence 


<220> 




<221> 


priiner_bind 


<223> 


forward oligonucleotide 




varicint (TADG-12V) used 


<400> 


30 



tccaggtggg tctagtttcc 20 

<210> 31 
<211> 20 
<212> DNA 

<213> Artificial sequence 

<220> 

<221> primer_bind 

<223> reverse oligonucleotide primer for TADG-12 

variant (TADG-12V) used for quantitative PGR 
<400> 31 

ctctttggct tgtacttgct 20 

<210> 32 
<211> 20 
<212> DNA 

<213> Artificial sequence 

<220> 

<221> primer_bind 

<223> forward oligonucleotide primer for p-tubulin 

used as an internal control for quantitative PGR 
<400> 32 

cgcatcaacg tgtactacaa 20 

<210> 33 
<211> 20 
<212> DNA 

<213> Artificial sequence 

<220> 

< 2 2 1 > pr imer_bind 

<223> reverse oligonucleotide primer for P-tubulin 

used as an internal control for quantitative PGR 
<400> 33 

tacgagctgg tggactgaga 20 

<210> 34 

<211> 12 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> a poly- lysine linked multiple antigen peptide 



SEQ 15/41 



WOOO/52044 



PCT/USOO/05612 



derived from the TADG-12 carboxy- terminal protein 
sequence, present in full length TADG-12, but not 
in TADG-12V 
<400> 34 

Trp lie His Glu Gin Met Glu Arg Asp Leu Lys Thr 

5 10 

<210> 35 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 40 , . .48 

<223> TADG-12 peptide 

<400> 35 

lie Leu Ser Leu Leu Pro Phe Glu Val 





5 




<210> 


36 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


144 . . 


.152 


<223> 


TADG- 


12 peptide 


<400> 


36 





Ala Gin Leu Gly Phe Pro Ser Tyr Val 





5 




<210> 


37 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


225 . . 


.233 


<223> 


TADG- 


12 peptide 


<400> 


37 





Leu Leu Ser Gin Trp Pro Trp Gin Ala 

5 

<210> 38 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 252 . . -260 

<223> TADG-12 peptide 

<400> 38 

Trp lie lie Thr Ala Ala His Cys Val 

5 



SEQ 16/41 



wo 00/52044 



<210> 39 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 356 . • .364 

<223> TADG-12 peptide 

<400> 39 

Val Leu Asn His Ala Ala Val Pro Leu 

5 

<210> 40 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 176 . . .184 

<223> TADG-12 peptide 

<400> 40 

Leu Leu Pro Asp Asp Lys Val Thr Ala 

5 

<210> 41 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 13 ... 21 

<223> TADG-12 peptide 

<400> 41 

Phe Ser Phe Arg Ser Leu Phe Gly Leu 

5 

<210> 42 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 151. . .159 

<223> TADG-12 peptide 

<400> 42 

Tyr Val Ser Ser Asp Asn Leu Arg Val 

5 

<210> 43 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 436 . , .444 

<223> TADG-12 peptide 

<400> 43 



SEQ 17/41 



wo 00/52044 



Arg Val Thr Ser Phe Leu Asp Trp lie 

5 



<210> 44 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 234. . .242 

<223> TADG-12 peptide 

<400> 44 



Ser Leu Gin Phe Gin Gly Tyr His Leu 

5 



<210> 45 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 181. . .189 

<223> TADG-12 peptide 

<400> 45 



Lys Val Thr Ala Leu His His Ser Val 

5 



<210> 46 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 183 . . . 191 

<223> TADG-12 peptide 

<400> 46 



Thr Ala Leu His His Ser Val Tyr Val 

5 



<210> 47 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 411, . .419 

<223> TADG-12 peptide 

<400> 47 



Arg Leu Trp Lys Leu Val Gly Ala Thr 

5 



<210> 48 

<211> 9 

<212> PRT 

<213> Homo sapiens 



SEQ 18/41 



wo 00/52044 



PCT/USOO/05612 



<220> 

<222> 60. , .68 

<223> TADG-12 peptide 

<400> 48 



Leu lie Leu Ala Leu Ala lie Gly Leu 

5 



<210> 49 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 227 . . .235 

<223> TADG-12 peptide 

<400> 49 



Ser Gin Trp Pro Trp Gin Ala Ser Leu 

5 



<210> 50 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 301. . .309 

<223> TADG-12 peptide 

<400> 50 



Arg Leu Gly Asn Asp lie Ala Leu Met 

5 



<210> 51 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 307 . . . 315 

<223> TADG-12 peptide 

<400> 51 



Ala Leu Met Lys Leu Ala Gly Pro Leu 

5 



<210> 52 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 262 . . .270 

<223> TADG-12 peptide 

<400> 52 



Asp Leu Tyr Leu Pro Lys Ser Trp Thr 

5 



SEQ 19/41 



wo 00/52044 



<210> 53 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 416. . .424 

<223> TADG-12 peptide 

<400> 53 

Leu Val Gly Ala Thr Ser Phe Gly lie 

5 

<210> 54 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 54. , ,62 

<223> TADG-12 peptide 

<400> 54 

Ser Leu Gly lie lie Ala Leu lie Leu 

5 

<210> 55 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 218. . .226 

<223> TADG-12 peptide 

<400> 55 

lie Val Gly Gly Asn Met Ser Leu Leu 

5 

<210> 56 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 35 . . .43 

<223> TADG-12 peptide 

<400> 56 

Ala Val Ala Ala Gin lie Leu Ser Leu 

5 

<210> 57 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 271 . . .279 

<223> TADG-12 peptide 

<400> 57 



SEQ 20/41 



wo 00/52044 



lie Gin Val Gly Leu Val Ser Leu Leu 

5 



<210> 58 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 397. . .405 

<223> TADG-12 peptide 

<400> 58 



Cys Gin Gly Asp Ser Gly Gly Pro Leu 

5 



<210> 59 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 270 . . .278 

<223> TADG-12 peptide 

<400> 59 



Thr lie Gin Val Gly Leu Val Ser Leu 

5 



<210> 60 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 56 ... 64 

<223> TADG-12 peptide 

<400> 60 



Gly lie lie Ala Leu lie Leu Ala Leu 

5 



<210> 61 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 110 . . , 118 

<223> TADG-12 peptide 

<400> 61 



Arg Val Gly Gly Gin Asn Ala Val Leu 

5 



<210> 62 

<211> 9 

<212> PRT 

<213> Homo sapiens 



SEQ 21/41 



wo 00/52044 



PCT/USOO/05612 



<220> 

<222> 217 . . .225 

<223> TADG-12 peptide 

<400> 62 



Arg lie Val Gly Gly Asn Met Ser Leu 

5 



<210> 63 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 130 . . . 138 

<223> TADG-12 peptide 

<400> 63 



Cys Ser Asp Asp Trp Lys Gly His Tyr 

5 



<210> 64 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 8 . . • 16 

<223> TADG-12 peptide 

<400> 64 



Ala Val Glu Ala Pro Phe Ser Phe Arg 

5 



<210> 65 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 328 . . . 336 

<223> TADG-12 peptide 

<400> 65 



Asn Ser Glu Glu Asn Phe Pro Asp Gly 

5 



<210> 66 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 3 ... 11 

<223> TADG-12 peptide 

<400> 66 



Glu Asn Asp Pro Pro Ala Val Glu Ala 

5 



SEQ 22/41 



wo 00/52044 



<210> 67 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 98, . .106 

<223> TADG-12 peptide 

<400> 67 

Asp Cys Lys Asp Gly Glu Asp Glu Tyr 

5 

<210> 68 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 346 . . .354 

<223> TADG-12 peptide 

<400> 68 

Ala Thr Glu Asp Gly Gly Asp Ala Ser 

5 

<210> 69 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 360 ... 368 

<223> TADG-12 peptide 

<400> 69 

Ala Ala Val Pro Leu lie Ser Asn Lys 

5 

<210> 70 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 153 . . . 161 

<223> TADG-12 peptide 

<400> 70 

Ser Ser Asp Asn Leu Arg Val Ser Ser 

5 

<210> 71 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 182 . . . 190 

<223> TADG-12 peptide 

<400> 71 



SEQ 23/41 



wo 00/52044 



Val Thr Ala Leu His His Ser Val Tyr 

5 



<210> 72 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 143 . . , 151 

<223> TADG-12 peptide 

<400> 72 



Cys Ala Gin Leu Gly Phe Pro Ser Tyr 

5 



<210> 73 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 259 . . .267 

<223> TADG-12 peptide 

<400> 73 



Cys Val Tyr Asp Leu Tyr Leu Pro Lys 

5 



<210> 74 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 369 . . .377 

<223> TADG-12 peptide 

<400> 74 



lie Cys Asn His Arg Asp Val Tyr Gly 

5 



<210> 75 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 278. . .286 

<223> TADG-12 peptide 

<400> 75 



Leu Leu Asp Asn Pro Ala Pro Ser His 

5 



<210> 76 

<211> 9 

<212> PRT 

<213> Homo sapiens 



SEQ 24/41 



wo 00/52044 



<220> 

<222> 426 . . .434 

<223> TADG-12 peptide 

<400> 76 



Cys Ala Glu Val Asn Lys Pro Gly Val 

5 



<210> 


77 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


3 2 • • • 


40 


<223> 


TADG- 


12 peptide 


<400> 


77 





Asp Ala Asp Ala Val Ala Ala Gin lie 

5 



<210> 78 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 406 . . .414 

<223> TADG-12 peptide 

<400> 78 



Val Cys Gin Glu Arg Arg Leu Trp Lys 

5 



<210> 


79 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


329. . 


.337 


<223> 


TADG- 


12 peptide 


<400> 


79 





Ser Glu Glu Asn Phe Pro Asp Gly Lys 

5 



<210> 


80 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


303 . . 


.311 


<223> 


TADG- 


12 peptide 


<400> 


80 





Gly Asn Asp lie Ala Leu Met Lys Leu 

5 



SEQ 25/41 



wo 00/52044 



<210> 81 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 127. , ,135 

<223> TADG-12 peptide 

<400> 81 

Lys Thr Met Cys Ser Asp Asp Trp Lys 

5 

<210> 82 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 440 . . .448 

<223> TADG-12 peptide 

<400> 82 

Phe Leu Asp Trp lie His Glu Gin Met 

5 

<210> 83 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 433 . . .441 

<223> TADG-12 peptide 

<400> 83 

Val Tyr Thr Arg Val Thr Ser Phe Leu 

5 

<210> 84 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 263 . . . 271 

<223> TADG-12 peptide 

<400> 84 

Leu Tyr Leu Pro Lys Ser Trp Thr lie 

• 5 

<210> 85 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 169 . . , 177 

<223> TADG-12 peptide 

<400> 85 



SEQ 26/41 



wo 00/52044 



Glu Phe Val Ser lie Asp His Leu Leu 

5 

<210> 86 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 296 . . .304 

<223> TADG-12 peptide 

<400> 86 

Lys Tyx Lys Pro Lys Arg Leu Gly Asn 

5 

<210> 87 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 16 . , .24 

<223> TADG-12 peptide 

<400> 87 

Arg Ser Leu Phe Gly Leu Asp Asp Leu 

5 

<210> 88 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 267 . . .275 

<223> TADG-12 peptide 

<400> 8 8 

Lys Ser Trp Thr lie Gin Val Gly Leu 

5 

<210> 89 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 81. . .89 

<223> TADG-12 peptide 

<400> 89 

Arg Ser Ser Phe Lys Cys lie Glu Leu 

5 



<210> 90 
<211> 9 
<212> PRT 



SEQ 27/41 



wo 00/52044 



<213> Homo sapiens 
<220> 

<222> 375. . .383 

<223> TADG-12 peptide 

<400> 90 



Val Tyr Gly Gly lie lie Ser Pro Ser 

5 



<210> 91 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 110. . .118 

<223> TADG-12 peptide 

<400> 91 



Arg Val Gly Gly Gin Asn Ala Val Leu 

5 



<210> 92 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 189. . .197 

<223> TADG-12 peptide 

<400> 92 



Val Tyr Val Arg Glu Gly Cys Ala Ser 

5 



<210> 93 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 165... 173 

<223> TADG-12 peptide 

<400> 93 



Gin Phe Arg Glu Glu Phe Val Ser lie 

5 



<210> 94 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 10. . .18 

<223> TADG-12 peptide 

<400> 94 



Glu Ala Pro Phe Ser Phe Arg Ser Leu 

5 



SEQ 28/41 



wo 00/52044 



<210> 95 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 407 . . .415 

<223> TADG-12 peptide 

<400> 95 



Cys Gin Glu Arg Arg Leu Trp Lys Leu 

5 



<210> 96 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 381. . .389 

<223> TADG-12 peptide 

<400> 96 



Ser Pro Ser Met Leu Cys Ala Gly Tyr 

5 



<210> 97 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 375. . ,383 

<223> TADG-12 peptide 

<400> 97 



Val Tyr Gly Gly lie lie Ser Pro Ser 

5 



<210> 98 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 3 81 . . .3 89 

<223> TADG-12 peptide 

<400> 98 



Ser Pro Ser Met Leu Cys Ala Gly Tyr 

5 



<210> 99 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 362 . . .370 

<223> Ti\DG-12 peptide 



SEQ 29/41 



wo 00/52044 



PCT/USOO/05612 



<400> 99 

Val Pro Leu lie Ser Asn Lys lie Cys 

5 

<210> 100 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 373 . . .381 

<223> TADG-12 peptide 

<400> 100 

Arg Asp Val Tyr Gly Gly lie lie Ser 

5 

<210> 101 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 283 . . .291 

<223> TADG-12 peptide 

<400> 101 

Ala Pro Ser His Leu Val Glu Lys lie 

5 

<210> 102 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 177 . , . 185 

<223> TADG-12 peptide 

<400> 102 

Leu Pro Asp Asp Lys Val Thr Ala Leu 

5 

<210> 103 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 47 ... 55 

<223> TADG-12 peptide 

<400> 103 

Glu Val Phe Ser Gin Ser Ser Ser Leu 

5 

<210> 104 

<211> 9 

<212> PRT 



SEQ 30/41 



wo 00/52044 



PCT/USOO/05612 



<213> Homo sapiens 
<220> 

<222> 36. . .44 

<223> TADG-12 peptide 

<400> 104 

Val Ala Ala Gin lie Leu Ser Leu Leu 

5 

<210> 105 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 255 . . .263 

<223> TADG-12 peptide 

<400> 105 

Thr Ala Ala His Cys Val Tyr Asp Leu 

5 

<210> 106 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 138. . .146 

<223> TADG-12 peptide 

<400> 106 

Tyr Ala Asn Val Ala Cys Ala Gin Leu 

5 

<210> 107 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 195 . . .203 

<223> TADG-12 peptide 

<400> 107 

Cys Ala Ser Gly His Val Val Thr Leu 

5 

<210> 108 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 215 . . .223 

<223> TADG-12 peptide 

<400> 108 

Ser Ser Arg lie Val Gly Gly Asn Met 

5 



SEQ 31/41 



wo 00/52044 



<210> 109 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 298. . .306 

<223> TADG-12 peptide 

<400> 109 



Lys Pro Lys Arg Leu Gly Asn Asp lie 

5 



<210> 110 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 313 . . .321 

<223> TADG-12 peptide 

<400> 110 



Gly Pro Leu Thr Phe Asn Glu Met lie 

5 



<210> ill 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 108 . . . 116 

<223> TADG-12 peptide 

<400> 111 



Cys Val Arg Val Gly Gly Gin Asn Ala 

5 



<210> 112 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 294. . .302 

<223> TADG-12 peptide 

<400> 112 



His Ser Lys Tyr Lys Pro Lys Arg Leu 

5 

<210> 113 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 265 . . .273 

<223> TADG-12 peptide 



SEQ 32/41 



wo 00/52044 ^ PCT/USOO/05612 



<400> 113 

Leu Pro Lys Ser Trp Thr lie Gin Val 

5 

<210> 114 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 88 ... 96 

<223> TADG-12 peptide 

<400> 114 

Glu Leu lie Thr Arg Cys Asp Gly Val 

5 

<210> 115 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 79. . .87 

<223> TADG-12 peptide 

<400> 115 

Arg Cys Arg Ser Ser Phe Lys Cys lie 

5 

<210> 116 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 255 , . .263 

<223> TADG-12 peptide 

<400> 116 

Thr Ala Ala His Cys Val Tyr Asp Leu 

5 

<210> 117 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 207 . . .215 

<223> TADG-12 peptide 

<400> 117 

Ala Cys Gly His Arg Arg Gly Tyr Ser 

5 

<210> 118 
<211> 9 
<212> PRT 



SEQ 33/41 



wo 00/52044 



<213> Homo sapiens 
<220> 

<222> 154 . . . 162 

<223> TADG-12 peptide 

<400> 118 



Ser Asp Asn Leu Arg Val Ser Ser Leu 

5 



<210> 119 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 300 . . .308 

<223> TADG-12 peptide 

<400> 119 



Lys Arg Leu Gly Asn Asp lie Ala Leu 

5 



<210> 120 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 435 . . .443 

<223> TADG-12 peptide 

<400> 120 



Thr Arg Val Thr Ser Phe Leu Asp Trp 

5 



<210> 121 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 376 , . .384 

<223> TADG-12 peptide 

<400> 121 



Tyr Gly Gly lie lie Ser Pro Ser Met 

5 



<210> 122 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 410 . . .418 

<223> TADG-12 peptide 

<400> 122 



Arg Arg Leu Trp Lys Leu Val Gly Ala 

5 



SEQ 34/41 



wo 00/52044 



<210> 123 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 210. , .218 

<223> TADG-12 peptide 

<400> 123 



His Arg Arg Gly Tyr Ser Ser Arg lie 

5 



<210> 124 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 109 . . . 117 

<223> TADG-12 peptide 

<400> 124 



Val Arg Val Gly Gly Gin Asn Ala Val 

5 



<210> 125 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 191. . .199 

<223> TADG-12 peptide 

<400> 125 

Val Arg Glu Gly Cys Ala Ser Gly His 

5 



<210> 126 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 78 ... 86 

<223> TADG-12 peptide 

<400> 126 



Tyr Arg Cys Arg Ser Ser Phe Lys Cys 

5 



<210> 127 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 113 . . .121 
<223> - TADG-12 peptide 



SEQ 35/41 




wo 00/52044 PCT/USOO/05612 

<400> 127 

Gly Gin Asn Ala Val Leu Gin Val Phe 

5 

<210> 128 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 91, . .99 

<223> TAIX3-12 peptide 

<400> 128 

Thr Arg Cys Asp Gly Val Ser Asp Cys 

5 

<210> 129 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 38 ... 46 

<223> TADG-12 peptide 

<400> 129 

Ala Gin lie Leu Ser Leu Leu Pro Phe 

5 

<210> 130 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 211. . .219 

<223> TADG-12 peptide 

<400> 130 

Arg Arg Gly Tyr Ser Ser Arg lie Val 

5 

<210> 131 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 216 . . .224 

<223> TADG-12 peptide 

<400> 131 

Ser Arg lie Val Gly Gly Asn Met Ser 

5 

<210> 132 

<211> 9 

<212> PRT 



SEQ 36/41 




wo 00/52044 — PCT/USOO/05612 

<213> Homo sapiens 
<220> 

<222> 118, . .126 

<223> TADG-12 peptide 

<400> 132 

Leu Gin Val Phe Thr Ala Ala Ser Trp 

5 

<210> 133 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 370. . .378 

<223> TADG-12 peptide 

<400> 133 

Cys Asn His Arg Asp Val Tyr Gly Gly 

5 

<210> 134 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 393... 401 

<223> TADG-12 peptide 

<400> 134 

Gly Val Asp Ser Cys Gin Gly Asp Ser 

5 

<210> 135 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 235. . .243 

<223> TADG-12 peptide 

<400> 135 

Leu Gin Phe Gin Gly Tyr His Leu Cys . 

5 

<210> 136 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 427, . .435 

<223> TADG-12 peptide 

<400> 136 

Ala Glu Val Asn Lys Pro Gly Val Tyr 

5 



SEQ 37/41 



wo 00/52044 



<210> 137 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 162 • . ,170 

<223> TADG-12 peptide 

<400> 137 



Leu Glu Gly Gin Phe Arg Glu Glu Phe 

5 



<210> 138 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 9 ... 17 

<223> TADG-12 peptide 

<400> 138 



Val Glu Ala Pro Phe Ser Phe Arg Ser 

5 



<210> 139 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 318. . .326 

<223> TADG-12 peptide 

<400> 139 

Asn Glu Met lie Gin Pro Val Cys Leu 

5 



<210> 140 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 256 . . .264 

<223> TADG-12 peptide 

<400> 140 



Ala Ala His Cys Val Tyr Asp Leu Tyr 

5 



<210> 141 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 46 , . .54 

<223> TADG-12 peptide 



SEQ 38/41 




m 



WO 00/52044 PCTAJSOO/05612 

<400> 141 

Phe Glu Val Phe Ser Gin Ser Ser Ser 

5 

<210> 142 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 64 . . .72 

<223> TADG-12 peptide 

<400> 142 

Leu Ala lie Gly Leu Gly lie His Phe 

5 

<210> 143 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 192 . . .200 

<223> TADG-12 peptide 

<400> 143 



Arg Glu Gly Cys Ala Ser Gly His Val 

5 



<210> 144 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 330... 338 

<223> TADG-12 peptide 

<400> 144 



Glu Glu Asn Phe Pro Asp Gly Lys Val 

5 



<210> 145 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 182. . .190 

<223> TADG-12 peptide 

<400> 145 

Val Thr Ala Leu His His Ser Val Tyr 

5 



<210> 146 
<211> 9 
<212> PRT 



SEQ 39/41 



wo 00/52044 



<213> Homo sapiens 
<220> 

<222> 408. . .416 

<223> TADG-12 peptide 

<400> 146 



Gin Glu Arg Arg Leu Trp Lys Leu Val 

5 



<210> 


147 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<22.0> 






<222> 


206 . . 


.214 


<223> 


TADG- 


12 peptide 


<400> 


147 





Thr Ala Cys Gly His Arg Arg Gly Tyr 

5 



<210> 


148 


<211> 


9 


<212> 


PRT 


<213> 


Homo sapiens 


<220> 




<222> 


5. . .13 


<223> 


TADG-12 peptide 


<400> 


148 



Asp Pro Pro Ala Val Glu Ala Pro Phe 

5 



<210> 


149 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


261. . 


.269 


<223> 


TADG- 


12 peptide 


<400> 


149 





Tyr Asp Leu Tyr Leu Pro Lys Ser Trp 

5 



<210> 


150 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


• • • 


41 


<223> 


TADG- 


12 peptide 


<400> 


150 





Ala Asp Ala Val Ala Ala Gin lie Leu 

5 



SEQ 40/41 



0 .w 
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<210> 151 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 168 . . .176 

<223> TADG-12 peptide 

<400> 151 



Glu Glu Phe Val Ser lie Asp His Leu 

5 



<210> 152 

<211> 9 

<212> PRT 

< 2 1 3 > Homo Scipi ens 

<220> 

<222> 304 . . .312 

<223> TADG-12 peptide 

<400> 152 



Asn Asp lie Ala Leu Met Lys Leu Ala 

5 



<210> 153 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 104. . .112 

<223> TADG-12 peptide 

<400> 153 



Asp Glu Tyr Arg Cys Val Arg Val Gly 

5 



SEQ 41/41 
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Role of NH2-terminal Positively Charged Residues in Establishing 
Membrane Protein Topology* 



(Received for publication, April 14, 1993, and in revised form. May 21, 1993) 



Griffith D. Parks^ and Robert A. Lambfi 

From the Howard Hughes Medical InatUute and Department of Biochemistry, Molecuiar Biology and Cell Biology, Northwestern 
University, Evanston, IlUnois 60208-3500 



The paramyxovirus HN polyjieptide is a model type 
n membrane protein, containing an internal uncleaved 
signal/anchor (S/A) and is oriented in the membrane 
with an NHs-terminal c3rtoplasmic domain and COOH- 
terminal ectodomain (N«yt topology). To test the role of 
NHa-terminal positively charged residues in directing 
the HN membrane topology, the 3 arginine (Arg) resi- 
dues within the 17-amino-acid NHa*terminal domain 
were systematically converted to a glutamine or glu- 
tamate, and the topology of the mutant proteins was 
examined after expression in CV-1 (sells. The data 
indicate that: (i) each of the NHa-terminal Arg residues 
contributes to the signal directing proper HN topology, 
since substitutions in any of the three positions resulted 
in ~ 13-23% inversion into the Ne^ form; (ii) substitu- 
tions in the Arg directly flanking the signal/anchor 
domain resulted in slightly more inversion than those 
which were located more distally; and (iii) substitution 
with a negatively charged glutamate led to more in- 
version than did replacement with an uncharged glu- 
tamine. The effect of a single Arg to Glu substitution 
on the HN topology was enhanced when present in the 
context of a truncated NHa-terminal cytoplasmic tail 
(3 residues). A comparison of the sequences flanking 
the signal/anchor of well documented tyj>e HI proteins 
showed that the majority of these proteins contain a 
negatively charged residue flanking the NHa-terminal 
side. An exception to this rule is the NB protein which 
contains a single positively charged Arg residue in this 
position. A chimeric protein containing the NB ecto- 
domain and the HN S/A and EIN ectodomain lead to a 
signiHcant fraction (70%) of the chimeric protein ad- 
opting tsrpe n topology suggesting that the positive 
charge flanking the S/A domain is important for estab- 
lishing type n topology. These data are discussed in 
the context of the loop model for the biogenesis of 
integral membrane proteins and the possible signals 
necessary for establishing differing orientations. 



The ability of an integral membrane protein to function 
properly depends on the precise targeting of the cytoplasmic 

* The costs of publication of this article were defrayed in part by 
the payment of page charges. This article must therefore be hereby 
marked "advertisement'* in accordance with 18 U.S*C. Section 1734 
solely to indicate this fact. 

X Associate of the Howard Hughea Medical Institute. Present ad- 
dress: Dept. of Microbiology and Immunology, Bowman Gray School 
of Medicine of Wake Forest University, Winston-Salem, NC 27157- 
1064. 

§ Investigator of the Howard Hughes Medical Institute. To whom 
correspondence should be addressed: Dept. of Biochemistry, Molec- 
ular Biology and Cell Biology, 2153 Sheridan Rd., Evanston, IL 
00208-3600. Tel.: 708-491-5433; Pax: 708-491-2467. 



and extracellular domains of the polypeptide to the correct 
side of the membrane. The signals directing a protein into a 
characteristic membrane topology are contained within the 
amino acid sequence of the polypeptide (Blobel, 1980) and 
must be very precise as it appears that all naturally occurring 
membrane proteins adopt only a single final orientation. The 
majority of known membrane proteins which span the lipid 
bilayer a single time are classi&ed as type I proteins (nomen- 
clature of von Heijne and Gavel, 1988), based on the presence 
of both an NH^-terminal cleavable signal sequence which 
targets the nascent polypeptide to the ER^ membrane through 
an interaction with the signal recognition particle (SRP; 
Walter and Lingappa, 1986) and a separate COOH-terminal 
hydrophobic domain which acts as a stop transfer domain 
(membrane anchor). These proteins have an extracellular 
NH2-terminal domain and a cytoplasmic COOH-terminal tail 
(Noxo topology), A second class of membrane proteins has 
been found, with fewer known members than the type I 
membrane proteins, in which the proteins adopt the opposite 
orientation and have an NHa-terminal cytoplasmic tail and a 
COOH-terminal ectodomain (Ncyt topology). These type II 
proteins lack an NH2-terminal cleavable signal sequence, but 
contain an internal hydrophobic signal/anchor (S/A) which 
serves a dual function: the signaling of the nascent polypep- 
tide to the ER membrane and the subsequent anchoring of 
the polypeptide in the lipid bilayer. Examples of type II 
proteins include the transferrin receptor (Schneider et aL, 
1984), asialoglycoprotein receptor (Spiess and Lodish, 1986), 
the family of Golgi-resident glycosyltransferases (Patdson and 
Colley, 1989), and the paramyxovirus HN protein (Hiebert et 
a/., 1985). The least common class of membrane proteins that 
span the lipid bilayer a single time are the type III proteins 
which also contain an internal uncleaved S/A, but these 
proteins have an extracellular NHa-terminal domain and are 
in the Nexo orientation. Examples of type III proteins include 
the cytochrome P-450 proteins (Nelson and Strobel, 1988), 
the erythrocyte sialoglycoprotein 0 (High and Tanner, 1987), 
and the influenza A virus M2 protein and influenza B virus 
NB protein (Lamb et oi., 1986; Williams and Lamb, 1986). 

In contrast to the cleavable signal sequences of the type I 
membrane proteins which have been analyzed in detcul both 
by amino acid comparison (von Heijne, 1984, 1985) and ex- 
perimentally (e,g. Nothwehr and Gordon, 1989), relatively 
little is known about the structural features which distinguish 
the two types of membrane proteins with internal uncleaved 
S/A sequences. The type II and HI proteins both appear to 
use the same SRP-mediated mechanism for targeting to the 
ER membrane (Lipp and Dobberetein, 1986b; Hull et oi., 
1988). However, the signals which direct the steps following 

^The abbreviations used are: ER, endoplasmic reticulum; SRP, 
signal recognition particle; N-glycanase; peptide:re-glyco8idiase F; 
PAGE, polyacrylamide gel electrophoresis. 
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Fig. 1. Structure and expression of HN' arginine substitution mutants. A, schematic diagram of Arg substitution mutents The 
amino acid sequence of the NH.-terminal domain of HN WT' is shown in the one tetter code with the HN «l8na /anchor ^^^^^^^^ 
depicted as a hatched box. A solid horizontal line denotes sequence identity to WT* with glutamate (£) or glutam.ne (Q) substitutions shown 
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this interaction of the S/A with SRP and lead to exclusively 
the Nto> or Net topology have not been determined. Hydro- 
phobicity appears to be the only structural requirement for 
an uncleaved S/A to function in the targeting and anchoring 
of a polypeptide (Audigier et al, 1987; Parks et ol., 1989; Zerial 
et ait 1987). As such, the analysis of topogenic sequences of 
type 11 and III proteins has focused on residues flanking the 
S/A domain, and it has been shown that these two types of 
proteins can be inverted in the membrane by complete ex- 
change of NHa- or COOH-terminal S/A-flanking regions 
(Haeuptle et oL, 1989; Parks et oL, 1989; Parks and Lamb, 
1991). On the basis of a theoretical analysis, based on amino 
acid sequences available from databases and examining amino 
acid sequences flanking S/A domains, two different hy- 
potheses have been proposed to explain the orientation of 
type II and III integral membrane proteins, (a) The "charge 
difference" rule (Hartmann et aL, 1989) proposed that when 
the differences in the sum of positive and negative charges 
within 15 residues of the NH2- and COOH-terminal sides of 
the S/A domain was calculated, the more positive side was 
cytoplasmic, in the manner of a dipole moment. (6) The 
"positive inside" rule (von Heijne. 1986; von Heijne and Gavel, 
1988) proposed that the topology of the protein is governed 
by positive charges alone, and the domain containing the most 
positive charges is cytoplasmic. However, in the case of two 
different type II proteins, data obtained from a systematic 
mutational analysis did not support either the charge differ- 
ence rule or the positive inside rule (Beltzer et ai., 1991; Parks 
and Lamb, 1991). The experimental data indicated that pos- 
itive charges in the NHa-terminal domain of type II proteins 
play a pivotal role in directing the Ncyi topology, since it has 
been shown that the removal of positive chcurges from the 
NHa-terminal S/A-flanking region leads to inversion of type 
II proteins into the N„o orientation, while the addition of 
positive charges to the COOH-terminal S/A-flanking region 
alone has little effect on topology (Beltzer et oL, 1991; Parks 
and Lamb, 1991). 

In an analysis of charge-altered HN mutants (Parks and 
Lamb, 1991), it was proposed that the HN orientation signal 
is comix>sed at least in part by a positively charged residue 
directly flanking the NH2-terminal side of the S/A. However, 
the potential role of positively charged residues located more 
distal to the S/A was not tested, and it has been postulated 
that these residues may also contribute to the orientation 
signal (High and Dobberstein, 1992). Here we report a system- 
atic mutational analysis of the NHa-terminal positively 
charged residues of the HN protein cytoplasmic tail and their 
effect on HN orientation. The data indicate that each of the 
3 NHa-terminal Arg residues contributes to the signal direct- 
ing the type II topology, since charge-altering mutations in 
these residues lead to polypeptides which can adopt the in- 
verted N«, orientation. The ability to invert the HN topology 
by these substitutions depends on the distance of the mutation 
from the S/A, as well as the charge of the substituting residue, 
cmd the effect of these alterations is enhanced when in the 
context of a truncated NHs-terminal domain. These results 
are discussed in a model for the topogenic signals of type I, 
II, and III proteins. 



MATERIALS AND METHODS 

Ceto— Monolayer cultures of CV-1 cells were grown in Dulbecco's 
modified Ease's medium containing 10% fetal calf serum as described 
(Lamb and Lai, 19S2). 

Plaamid Construction and Mutagenesis — To construct a pGEM3 
plasmtd containing a bacteriophage T7 RNA polymerase transcription 
terminator (pGem3-term), the appropriate 570-base pair fragment 
was excised from pGemex-2 (Promega, Madison. WI) by digestion 
with iVoel and Hindlll and inserted into the Noel and Hindlll sites 
of pGEM3. A cDNA clone of the SV5 HN protein gene (Hiebert et 
ol, 1985) was modified previously to encode the addition of a consen- 
sus site for iV-linked glycosylation (Asn-Ala-Thr) near the NHa 
terminus of the protein (HN*; Parks and Lamb, 1991), and a fragment 
from this clone (encoding residues 1-81) was used as a source of 
starting materials for oligonucleotide-directed mutagenesis after in- 
serting into a bacteriophage M13 vector as described (Parks et oL, 
1989). Likewise, a cDNA clone encoding a deletion of 14 of 17 NHa- 
terminal residues (HNGl; Parks and Lamb, 1990) was used as starting 
material for the construction of mutants MVE and MVQ. Following 
mutagenesis, DNA fragments were excised from the repUcative form 
of M13 by digestion with ^RI and Pstl and linked to a DNA 
fragment encoding HN residues 82-565 in pGem3-term (Arg substi- 
tution mutants) or pGemll (MVR, MVE, and MVQ) such that 
mRNA sense transcripts could be produced using the T7 RNA polym- 
erase promoter. Nucleotide sequences were confirmed by dideoxynu- 
cleotide chain-terminating sequencing (Sanger et aL, 1977). 

To construct the gene encoding the chimeric protein NBHH. a 
cDNA fragment encoding a portion of the influenza virus B/Lee/40 
segment 6 gene (bases 1-58; Shaw et al, 1982) was fused to HN using 
standard polymerase chain reaction protocols to create the precise 
junction of the NB NHa-terminal domain and the HN S/A domain 
(Arg/Thr). The construction of the gene encoding the M^/HN chi- 
meric protein MgHH has been described previously (Parks et at, 
1989). 

Isotopic Labeling of Polypeptides, Immunoprecipitation, N-Glycan- 
ase Digestions, Protease Treatment of Microsomal Membranes, and 
Polyacrylamide Gel Electrophoresis — Proteins were expressed in CV- 
1 cells as described (Parks and Lamb, 1991) using a modified version 
of the vaccinia virus/Ti RNA polymerase system of Fuerst et al 
(1986). Vaccinia virus vTF7-3- infected cells were transfected with 
pGEM plasmid DNA encoding the HN mutants and radiolabeled 
from 3.5 to 4.5 h postinfection with 20-50 >iCi/ml Tran[**S] label 
(ICN Radiochemicals Inc., Irvine, CA) in Dulbeoco*s modified Eagle's 
medium lacking cysteine and methionine. Radiolabeled cells were 
washed in phosphate -buffered saline before lysis in 1% SDS. Immu- 
noprecipitation of proteins from cell extracts with antisera to dena- 
tured HN (HN antisera) was as described previously (Erickson and 
Blobel, 1979; Ng et at, 1990). Deglycosylation of proteins by treatment 
with peptideiiV-glycosidase F (N-glycanase) was carried out as de- 
scribed (Williams and Lamb, 1986). Microsomal membranes were 
prepared from vaccinia virus-infected cells by Dounce homogeniza- 
tion (Adams and Rose, 1985) and analyzed by trypsin digestion as 
described previously (Parks et oL, 1989). Samples were analyzed by 
SDS-PAGE on 10% polyacrylamide gels, followed by fluorography 
(Lamb and Choppin, 1976). Autoradiograms were quantitated using 
a Molecular Dynamics model 400 series Phosphorimager (Suimyvale, 
CA), and represent the average of at least two experiments. 

Nomenclature — The nomenclature for type I-III proteins follows 
that of von Heijne and Gavel (1988). For the purposes of discussion, 
the borders of the S/A are operationally defined as the first charged 
residues located on either side of the first hydrophobic membrane - 
spanning region. The HN Arg substitution mutants (Fig. 1) are 
denoted by a numbering system which is a continuation of that used 
previously (Parks and Lamb, 1991). The HN cytoplasmic domain 
mutants MVR, MVE, and MVQ are named for the 3 residues which 
comprise the tail of these proteins. Hybrid proteins NBHH and 
MgHH are denoted by letters which represent the origin of the NHj- 
terminal domain (NB or Mi), with the transmembrane domain and 
cytoplasmic domain being derived from HN (H). The Ms NH2- 



below their position in the HN NHa-terminal domain. The location of the NH^-terminal consensus site for NH2-linked glycosylation is 
highlighted by an asterisk. Vertical arrows indicate the location of the altered Arg residues. Nomenclature for the mutants is described in the 
text. Percent N«, values represent the average of at least two experiments. B, expression of Arg substitution mutants. CV-1 cells infected 
with vaccinia virus vTF7-3 were transfected with DNA plasmids encoding the Arg substitution mutants. Polypeptides were radiolabeled from 
3.6-4.6 h postinfection with Tran[^S] label, immunoprecipitated with HN antisera, and analyzed by SDS-PAGE. N^* and N«o denote 
polypeptides with the WT HN and inverted membrane orientations, respectively. 
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terminal domain used (Mg) contains a site for addition of ^-linked 
carbohydrate (Parks et oL, 1989). 

RESULTS 

Role of HN NH^'terminal Arg Residues in Topogenesis — To 
examine experimentally the role of NH2-termiiial positively 
charged residues in the cytoplasmic tail of a type II integral 
membrane protein in directing membrane topology, a series 
of charge-altered mutants was produced in which the 3 NH2- 
terminal Arg residues of HN were converted individually (Fig. 
LA, mutants 10*, 12*, 14*- 27*) or in combination (mutants 
18*-25*) to a negatively charged glutamate (E) or uncharged 
glutamine (Q). As a means of monitoring directly expression 
in the form, each of these mutants also contained a single 
site for the addition of an AT-Iinked carbohydrate residue 
which had been inserted near the HN NH2 terminus (HN*, 
Parks and Lamb, 1991). It was anticipated that glycosylation 
of the NHa-terminal domain of HN molecules inverted into 
the N.XO topology would result in a species with a slower 
electrophoretic mobility than that of unglycosylated HN and 
would allow for a distinction between molecules having the 
HN Ncyt orientation (four accessible COOH-terminal glyco- 
sylation sites), bone fide inversion into the N«o form (one 
accessible NHa-terminal glycosylation site), and unglycosyl- 
ated polypeptides which were defective in membrane target- 
ing. The HN mutants were expressed to high levels by first 
infecting CV-1 cells with a recombinant vaccinia virus which 
synthesizes T7 RNA polymerase (Fuerst etoL, 1986) and then 
transfecting the cells with DNA plasmids encoding the mu- 
tants under control of the T? promoter. After radiolabeling 
the cells with -labeled amino acids, (polypeptides were 
immunoprecipitated from cell extracts using HN antisera and 
examined by SDS-PAGE. 

As shown in Fig. IB, each of the charge-altered mutants 
was synthesized to varying degrees as a mixture of two major 
polypeptides: a species with an electrophoretic mobility 
closely matching that of HN WT* (N^yt) and a faster migrating 
species denoted as Nuo. The slight differences in the electro- 
phoretic mobilities of the mutant polypeptides most likely 
reflect aberrant migration due to their charge differences. 
With each mutant, a single species which migrated faster than 
the N«xo form was generated after removal of the carbohydrate 
residues by i^-glycanase treatment, and this indicates that 
the two electrophoretic species observed in Fig. IB are a single 
polypeptide chain backbone that differs by glycosylation (data 
not shown, but see Parks and Lamb, 1991). Trace amoimts of 
polypeptides which migrate faster than the Ne^o form are 
degradation products and have an electrophoretic mobility 
distinct from deglycosylated HN (data not shown). Pulse- 
labeling followed by chase experiments indicated that the N,^ 
and Neu> forms of mutant proteins were relatively stable (data 
not shown), and thus, a comparison of the fraction of each 
mutant found in the N^o form is a valid measure of the 
relative effect of each mutation on topogenesis. Quantitation 
of several experiments by Phosphorimager analysis of the Ncyt 
and N.V, species showed that 13-23% of each of the single 
Arg mutants was expressed in the inverted N«xo form (Fig. 
IB, left panel). 

When 2 of the 3 HN NH2-terminal cytoplasmic domain 
Arg residues were mutated (Fig. IB, middle panel, mutants 
18* -23*), significantly more of the HN protein was inverted 
in the membrane in comparison to the single Arg substitu- 
tions. Within each pair of mutants, the substitution of an Arg 
residue by a negatively charged Glu resulted in slightly more 
efficient expression in the Nuo form than when the Arg was 
replaced by an uncharged Gin residue (e.g. compare mutant 
16* with 19*). Furthermore, substitution of the Arg located 



closest to the S/A led to greater expression in the New, form 
than did substitution of Arg residues which were more distal 
to the S/A, and this is most clearly seen by comparison of 
mutants 18* (56% N,«) and 22* (44% Nexo). The largest 
inversion of the HN orientation was seen in the case of mutant 
24* in which all of the Arg residues had been converted to 
Glu, and '^80% of this protein was oriented in the N«xo form 
(Fig. IB, 24* lane). Taken together, these data suggest that 
substitution of each of the NH2-terminal Arg residues leads 
to inversion of the HN type II topology, but that the positions 
closest to the S/A are more sensitive to these charge altera- 
tions. 

To determine if a single Arg residue directly flanking the 
S/A was sufficient to direct the type II topology, a mutant 
HN* protein was constructed (Fig. 2, 26*) in which both Arg 
11 and 15 were converted to uncharged Gin residues, leaving 
oidy Arg 19 which directly flanks the S/A. When the HN 
mutant 26* was expressed in CV-1 cells by the vaccinia virus 
T7 RNA polymerase sj^tem described above, two major poly- 
peptides were detected (Fig. 2, — lane), and both of these 
forms had an electrophoretic mobility which was slower than 
the single polypeptide produced after removed of the carbo- 
hydrate residues by treatment with iV-glycanase (+ lane). 
Quantitation of the relative amounts of the two forms by 
Phosphorimager analysis showed that 25% of this protein was 
expressed in the N„o orientation. Although the ability of each 
of the other 2 Arg residues to direct the Ncyt orientation by 
themselves has not been tested, these data indicate that a 
single S/A-flanking positively charged residue is sufficient to 
direct 75% of the molecules into the type II topology. Fur- 
thermore, a comparison of the HN 26* mutant (25% Nexo) 
with the 22* mutant shown in Fig. \B (44% Ne„) supports 
the above contention that the substitution of 2 Arg residues 
by a negatively charged Glu leads to greater inversion of HN 
than a substitution with uncharged Gin residues. 

Effect of Arg Substitutions in the Context of a Truncated 
NH2'terminal Domain — In the case of two other type II 
membrane proteins, IgCAT (Lipp and Dobberstein, 1986a) 
and the asialoglycoprotein receptor (Schmid and Spiess, 
1988), truncations of the NH2-termina] cytoplasmic tail result 
in molecules which were cleaved at a cryptic site in the S/A, 
and these processed polypeptides were soluble within the ER 
lumen. Analysis of the orientation of a cytoplasmic tail dele- 
tion mutant of a related HN protein (from Newcastle disease 
virus) suggested that the mutant protein was of mixed orien- 
tation (Wilson et aL, 1990). In contrast, when an SV5 HN 
mutant was constructed and expressed which has the NH3- 
terminal domain truncated from 17 residues to the S-residue 
tail MVR, a single major glycosylated species was detected 
(Fig. 3, MVR lanes). The available data indicate that the 
mutant MVR protein is integrated in the lipid bilayer (Parks 
and Lamb, 1990). We do not have a simple explanation for 
the difference in result obtained firom two related HN cyto- 
plasmic tail mutants except that the experiments differed in 
that in vitro and in vivo membrane integration was examined. 
As the data obtained with the MVR mutant were not compli- 
cated by a competing signal peptidase-like cleavage, it pro- 
vided the opportunity to examine the effect of Arg substitu- 
tions within the context of the truncated MVR cytoplasmic 
tail. 

Two mutants were constructed in which the single Arg 
residue in the MVR tail was converted to a Glu (E) or Gin 
(Q) residue to produce mutant proteins with NH2-terminal 
domains of MVE and MVQ (Fig. 3). Expression of the MVQ 
mutant using the vaccinia virus system described above (MVQ 
lanes) produced a protein profile which matched that pro- 
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Pig. 2. Effect of a single NH2-ter- 
minal S/A-flanking Arg residue on 
HN topology. CV-1 cells were infected 
with vaccinia vTF7-3 and transfected 
with a DNA plasmid encoding HN mu- 
tant 26*. After radiolabeling with 
Tranl^'S] label, polypeptides were ira- 
munoprecipitated from cell extracts with 
HN antisera. Immune complexes were 
divided into two portions, incubated with 
(+) or without (— ) iV-glycan£ise, and the 
polypeptides were examined by SDS- 
PAGE. The NH2-terminal amino acid 
sequence of HN WT* is shown with the 
location of the 2 Arg residues converted 
to Gin to create the 26* mutant indicated 
by arrows. 



N 
N 



cyt 




exo 



96 N 



exo 



MVNATEDAPVRATCRVLFR 



S/A 



26 



25 



duced by the MVR protein. For both MVR and MVQ, trace 
amounts of a faster migrating species were also observed 
(lanes MVR— and MVQ—), and these species have a different 
electrophoretic mobility than deglycosylated MVR and MVQ 
(+ lanes). It is thought likely that these species represent 
de^adation products. In contrast, the MVE protein was syn- 
thesized as two major polypeptide species: one which migrated 
like the Ncyt form of MVR and a faster- migrating Nexo poly- 
peptide with a mobility matching that of the single protein 
resulting from N-glycanase treatment (MVE lanes). Alkali 
treatment of microsomal membranes from cells expressing 
the MVE mutant did not remove either of these two protein 
species from the membrane (data not shown). However, the 
formal analysis of showing transmembrane topology by using 
proteases to trim a segment of the cytoplasmic tail could not 
be done because the small size of the cytoplasmic tail pre- 
cludes a shift in electrophoretic mobility of the trimmed form 
on gels. Although these data do not provide formal proof that 
the NH2-terminal domain of the N,xo form of MVE has been 
fully translocated across the ER membrane, the strong asso- 
ciation of both MVE species with the membrane suggests that 
the lack of glycosylation of the Neio form was due to inversion 
into the type III orientation and was not due to defective 
integration into the membrane. Quantitation of the two forms 
of the MVE protein synthesized during a 1-h labeling period 
indicated that 50% of the MVE molecules adopted the in- 
verted N„o form. Mutant MVQ was not inverted in mem- 
branes as compared to when the same membrane-proximal 
mutation was made in the full 19 -residue WT* tail (mutant 
12*) (0 versus 18% in the New form). A possible explanation 
is that the loss of the S/A -flanking positive charge in the 
MVQ mutant is compensated for by the positive charge con- 
tributed by the adjacent NHa terminus of this truncated 



protein. As the MVE mutant contained the same membrane- 
proximal mutation as mutant 10* and yet led to different 
levels of protein- inversion (50 versus 23%), it lends further 
credence to the notion that other charge residues in the 
cytoplasmic domedn are important in establishing orientation. 

The NH2'terminal Ectodomain of the Type III NB Protein 
Can Function as a Type II Cytoplasmic Tail — A compilation 
of the amino acid sequences of known type II membrane 
proteins shows that the vast majority of these proteins ('-90%) 
have a residue with a positive charge (Arg or Lys) directly 
flanking the NH2-terminal cytoplasmic side of the S/A (for 
compilations see reviews by Paulson and Colley, 1989; Hart- 
mann et al,, 1989), and the importance of this positive charge 
for type II membrane protein topogenesis has been demon- 
strated experimentally (Parks and Lamb, 1991). For the small 
number of naturally existing proteins which are exceptions to 
this correlation and lack an NH2-terminal positively charged 
S/A-flanking residue, it is possible that the presence of a 
negative charge in this position may be compensated for by a 
long stretch of positive charges located more distal (NH2- 
terminal) to the S/A {e.g. neutral endopeptidase, Malfroy et 
o/., 1988); a suggestion made previously in formulating the 
positive inside rule for membrane protein topogenesis (von 
Heijne and Gavel, 1988) and supported by the experimental 
data shown in Fig. 1. In comparison to type II membrane 
proteins, there are relatively few known examples of the 
oppositely orientated type III proteins, but the vast majority 
have a negatively charged Glu or Asp residue directly flanking 
the NHa-terminal side of the S/A (Fig. 4). One of the excep- 
tions to this correlation is found with the influenza B virus 
NB protein (Williams and Lamb, 1986) which contains a 
single NH2-terminal positively charged residue flanking the 
S/A domain. Earlier work has shown that when a chimeric 



19106 



Char^^mesidues Direct Membrane Protein Topo 



1^ I* 1^ I 

I - + I - + I - + I 



was further examined biochemically. Both NBHH Ncyt and 
Nexo forms were resistant to alkali extraction (data not 
shown), and the NBHH Ncyt form (like HN WT*) was pro- 
tected from digestion by trypsin of microsomal membranes 
whereas the faster migrating NBHH Ne^ form was susceptible 
to protease digestion (Fig. SB). Taken together, these data 
suggest that the NB NHs-terminal ectodomain is capable of 
acting as a cytoplasmic tail when linked to the HN S/A 
domain. 
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Fig. 3. The topological effect of charge alterations is en- 
hanced in the context of a truncated HN NHa-terminal do- 
main. CV-1 cells infected with vaccinia vims vTF7-3 were trans- 
fected with plasmids encoding HN mutants MVR, MVE, or MVQ. 
Polypeptides were radiolabeled, immunoprecipitated with HN anti- 
sera, digested with (4-) or without (— ) A'^-glycanase, and analyzed by 
SDS-PAGE as described for Fig. 2. The NH2-terminal sequence of 
the mutants is listed below that of HN, with the position of the 
altered Arg residue indicated by a vertical arrow, 

protein MgHH. which was composed of the NH2-terminal 
ectodomain of the type III M2 protein linked to the HN S/A 
and COOH-terminal domains, was expressed the chimera 
integrated into membranes in two opposing orientations, but 
with the N«»o orientation predominating (Parks and Lamb, 
1991 and see Fig. 5). As the NH2-terminal domain of NB has 
a S/A domain-proximal positive charge but is functionally a 
type III ectodomain, it was of interest to determine which 
would be the predominating factor when this portion of the 
NB protein was linked to the HN S/A and COOH-terminal 
domains in a chimeric protein, NBHH. 

The NBHH chimeric protein was expressed in CV-1 cells 
using the vaccinia T7 system and was found as two predomi- 
nant species (Fig. 5A, NBHH lanes): 70% as an Ncyt species 
with a mobility similar to that of the HN WT* protein ( WT* 
lanes), and 30% as a faster migrating N«o form. The difference 
in electrophoretic mobility between these two forms of NBHH 
was due to glycosylation (the Neio form has two and the Ncyt 
form has four glycosylation sites) as only a single NBHH 
polypeptide species with identical mobility to deglycosyiated 
WT* was detected after N-glycanase treatment (NBHH, -h 
lanes). The membrane orientation of the two NBHH species 



DISCUSSION 

All nascent polypeptide chains use a common machinery 
for the targeting to the ER membrane (Walter and Lingappa, 
1986), and yet by comparison very little amino acid identity 
is found among signal sequences. This is illustrated by a 
comparative sequence analysis (von Heijne, 1985) as well as 
experimentally, where it has been shown that seemingly ran- 
dom peptide sequences can function in targeting to the secre- 
tory pathway (Kaiser et al, 1987; Paterson and Lamb, 1990). 
Likewise, the mechanism which follows this targeting to the 
membrane and leads to exclusively one orientation in the lipid 
bilayer must be precise and at the same time degenerate 
topogenic signals must be recognized, as there is little amino 
acid sequence identity among a variety of membrane proteins 
which have the same topology. Recent data indicate that 
charged residues are an important part of the signal for 
determining membrane protein topology (Beltzer et at., 1991; 
Haeuptle et a/., 1989; Parks and Lamb, 1991). 

The data obtained from a systematic analysis of the role of 
each of the HN NH2-terminal Arg residues in determining 
the topology of the protein indicates that several conclusions 
can be drawn which address key features of membrane protein 
topology (reviewed in Boyd and Beckwith, 1990; High and 
Dobberstein, 1992) which although speculated on previously 
had not been examined by experiment. First, each of the 3 
HN Arg residues contributes to the signal directing the Ncyt 
topology, with substitutions in the proximal S/A-flanking 
position leading to more inversion into the Ne« form than 
substitutions of the distal positions. It was shown previously 
that the S/A-flanking Arg residue is very important in estab- 
lishing orientation. However, the charge alterations of this 
residue did not lead to complete inversion of HN in the 
membrane (Parks and Lamb, 1991). Thus, the observation 
that the inversion of HN was only partial can be explained 
by the presence of the other two NH2-terminal Arg residues, 
and HN can be nearly completely inverted to the N,xo form 
(80%) by replacing all 3 Arg residues with Glu. The finding 
that the NB ectodomain can direct the Ncyt topology to 
approximately the same extent as the HN 26* mutant (which 
contains only a single S/A-flanking Arg) lends further support 
to the proposal that the exact sequence of a cytoplasmic tail 
is less critical for the generation of the type II topology than 
the position and number of positive charges (Parks and Lamb, 
1991). Second, the relative importance of a given positively 
charged residue in contributing to the signal for topogenesis 
may depend on the length of the NH2-terminal tail, since HN 
is inverted in the membrane to a gfreater extent when a charge 
alteration is introduced into a truncated tail than when it is 
introduced in the context of the full-length NH2-terminal 
domain. Likewise, in the case of the asialogiycoprotein recep- 
tor (Beltzer et a/., 1991) 2 Arg to Asp substitutions lead to 
greater inversion in the membrane when introduced in the 
context of an NH2-terminal tail which has been truncated 
from 40 (3% Ne,o) to 11 residues (65% N„o). Thus, the 
orientation signal may depend on the position and charge 
density of the positive charges, and these two factors could 
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Fig. 4. Comparison of the amino acid sequence of typ^ UJ proteins. The 12 amino acidfi flanking the amino- {NH2) and carboxyl* 
iCOOH) aides of the transmembrane domain {TM) of known type III (N.«») proteins are listed in one letter code. The borders of the TM are 
operationally defmed as the first charged residue on either side of the hydrophobic domain. In some instances {e.g. IBV El protein), the first 
transmembrane domain of a multispanning membrane protein has been shown to be an uncleaved S/A with the Ng^o topology, and the 
relevant sequence of these proteins is included for completeness. This list may not be comprehensive, but includes those proteins for which 
there is reasonable biochemical evidence for type III topology. IBV, infectious bronchitis virus; LMu-CSF, long form of the multUineage 
colony-stimulating factor; rec. receptor, red., reductase; -R., rat; M., murine; H,, human; B,» bovine; Y., yeast; AEV^ avian erythroblastosis 
virus; UR2, avian sarcoma virus UR2. The references used are: 1) Nelson and Strobel, 1988; 2) Liu and Inglis, 1991; 3) Takumi et ai, 1988; 
4) Machamer and Rose, 1987; 5) Haeuptle et at, 1989; 6) Nathans et a/., 1986; 7) Schofield et ai, 1987; 8) Frielle et al., 1987; 9) Nathans and 
Hogness, 1983; 10) Feldheim et oL, 1992; 11) Porter and Kasper, 1985; 12) High and Tanner. 1987; 13) Masu et aL, 1987; 14) Neckameyer et 
cU., 1985; 15) Bergmann et a/., 1989; 16) JuUus et ai., 1988; 17) Lamb et ai., 1985; 18) WUliams and Lamb. 1986; 19) KobUka et ai., 1988; 20) 
Schatzman et oi., 1986. 



explain those few examples of type II proteins which have a 
negatively charged residue flanking the NH^-terminal side of 
the S/A (e.g. neutral endopeptidase, Malfroy et oL, 1988). 
Third, the substitution of Arg by a negatively charged Glu 
was a more potent inducer of inversion of HN orientation 
than was a replacement with an uncharged Gin {i,e. -'8-14% 
more in the N.xo form in the double Arg mutants). These data 
indicate that the inversion of HN orientation by these Arg 
substitutions was not due simply to lack of a positive charge 
and suggest that negative charges may act to promote trans- 
location across the ER membrauie. These observations are in 
contrast to the finding made for bacteria, where the orienta- 
tion of eui inner membrane protein can be reversed by the 
addition or removal of a single ptositively charged residue, but 
negative charges do not effect topology unless they are present 
in very high numbers (Nilsson and von Heijne, 1990; Anders- 
son et ai., 1992). 

A comparative analysis of the amino acids which comprise 
cleavable signal sequences indicates that these signals are 



composed of three domains: a positively- charged NH2-termi- 
nal region, a central short stretch of hydrophobic residues, 
and a COOH -terminal region containing small polar residues 
which defines the site of cleavage by signal peptidase (von 
Heijne, 1984, 1985). The uncleaved S/A of a typical type 11 
protein is structurally very similar to a type I signal sequence, 
and it has been shown experimentally that, except for the 
presence of a site for cleavage by signal peptidase in the type 
I proteins, these two signal sequences are functionally equiv- 
alent. It has been shown that a type II S/A can be converted 
to a cleavable signal sequence by NH^-terminal alterations 
which expose a cryptic cleavage site (Lipp and Dobberstein, 
1986a; Schmid and Spiess, 1983), and conversely it has been 
shown that a type I cleavable signal sequence can function as 
an uncleaved S/A when modified by extending the NH:- 
terminal flanking domain and blocking the cleavage site 
(Shaw et oL^ 1988). Based on these structural and functional 
similarities, it has been proposed that the type I and 11 
proteins share a common mechanism for membrane integra^ 
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Fig. 5. Expression and biochemi- 
cal characterization of the NBHH 
hybrid protein. A, expression of 
NBHH. Vaccinia virus vTF7-3- infected 
cells were transfected with plasmid DN A 
encoding HN WT*, NBHH. or MgHH. 
Proteins were radiolabeled, immunopre- 
cipitated with HN antisera. incubated 
with (+) or without (— ) /V-glycanase, and 
analyzed by SDS-PAGE as described in 
the legend to Fig. 2. The positions of the 
Ncyt and N.^o polypeptides are indicated. 
B, proteinase treatment of microsomal 
membranes from cells expressing WT* 
and NBHH. Vaccinia virus vTF7-3-in- 
fected cells were transfected with plas- 
mids encoding HN WT* {lanes 1 and 2) 
or NBHH {lanes 3 and 4) and were ra- 
diolabeled with Tran[**S]label. Crude 
microsomal membranes were prepared 
and treated with buffer {lanes 1 and 3) 
or with trypsin {lanes 2 and ^) as de- 
scribed previously (Parks et oL, 1989). 
Following centrifugation, samples were 
immunoprecipitated with HN antisera 
and analyzed by SDS-PAGE. The NHs- 
terminal sequence of HN WT* and of 
the chimeric NBHH and MgHH pro- 
teins is shown below, with a cross- 
hatched box and horizontal lines denoting 
the HN S/A and COOH-terminal ecto- 
domain. respectively. The location of the 
consensus sites for N-linked glycosyla- 
tion are highlighted by asterisks. 



A. 



WT* NBHH MgHH 



B 



WT 
1 2 



NBHH 
3 4 



N -> 



N 



exo 



Hutanfc 



NBHH 



HgHH 



% N 



MVNATBDAPVRATCRVLFR 
MNHATFNCTNINPITHIR 
MSNLTBVBTPI RNBWGCRCNDSSD 



S/A 



VNXXVVX 



30 
65 



tion and topogenesis (von Heijne and Blomberg, 1979; Inouye 
and Halegona, 1980; Engelman and Steitz, 1981; Shaw et al., 
1988), with the nascent polypeptide being presented to the 
ER membrane as a loop structure formed by holding both 
NH2- and COOH-terminal sides of the signal sequence on the 
cytoplasmic side of the lipid bilayer with the NHa-terminal 
retention signal composed at least in part of positively charged 
residues (reviewed in High and Dobberstein, 1992). 

In contrast to the establishment of type II protein orienta- 
tion, the rules determining type III protein orientation remain 
enigmatic. Type III proteins depend on SRP for membrane 
targeting and integration (Hull et aL, 1988) and may be 
presented initially to the membrane as a loop structure (for a 
schematic diagram, see review by High and Dobberstein, 
1992), but lacking the cytoplasmic retention signal the NH2 
terminus of these proteins would be translocated across the 
bilayer. As initially proposed to explain the topogenesis of the 
first Ne»o transmembrane of opsin (Audigier et al.^ 1987), the 
NH2-terminal region of all nascent membrane proteins (type 
I-III) may bind to an unrecognized factor to form the common 
loop structure, but for type III proteins this binding may be 
more readily dissociated leading to "flipping" of the NH2 
terminus across the ER membrane. The ability to vary the 
inversion of HN into the N«xo form by NH2- terminal charge 
alterations may reflect the degree of dissociation of the mu- 
tant NH2 terminus from this putative binding factor, with 
positively charged residues being held more tightly than neg- 
atively charged residues. In the case of Escherichia coli, the 
acidic SecA protein appears to interact directly with positive 
charges in the signal sequence of nascent type I proteins 
during translocation across the cytoplasmic membrane (Akita 
et aL, 1990), Although a protein analogous to secA has not 
been identified to date in eukaryotic cells, recent cross-linking 
and reconstitution studies have led to the identification of 
several ER membrane proteins which may be directly involved 



in forming an aqueous pore across membranes (reviewed in 
Rapoport, 1992). Thus, these proteins are candidates for 
interacting with the NH2-terminal positive charges of a nas- 
cent polypeptide chain. Alternatively, the type III proteins 
may employ a distinct topogenic mechanism, whereby the 
NH2 terminus is not bound to form the transient loop struc- 
ture but is presented to the ER membrane in a "head-on" 
configuration. 

The experimental data described here indicate that it is 
possible to convert a type II protein into the Ne,o topology by 
NH2-terminal charge alterations, and thus these data address 
indirectly the nature of the topogenic signals of naturally 
occurring type III proteins. Although experimentally a type 
III protein can be converted to a type II protein, by complete 
exchanges of S/A-flanking domains (Parks and Lamb, 1991), 
a direct systematic testing of the role of individual proximal 
and distal charges in generating the type III topology has yet 
to be performed. In the MgHH chimera, the type III Mg 
ectodomain which lacks a S/A-flanking-positively charged 
residue directed 65% of the molecule in the type III orienta- 
tion, whereas in the NBHH chimera the type III NB ectodo- 
main, which contains a positively charged residue flanking 
the S/A domain, directed 70% of the molecules in the opposing 
HN type II orientation. Thus, the signal for establishing type 
III topology may be complex and consist of the NH2-terminal 
ectodomain in conjunction with the S/A domain, and the 
artificial dividing of two parts of the signal in the chimera 
may explain the difference in the ability of the M2 and NB 
type III ectodomains to function in directing the N«xo topology 
when linked to the HN S/A (MgHH and NBHH). This may 
also explain the observation that a chimeric protein can adopt 
dual orientations, a problem not found with naturally existing 
proteins. In the case of the type III cytochrome P-450 protein, 
it has been proposed that membrane topology is determined 
by a balance between the NH2-terrainal charged residues and 
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the length of the hydrophobic signal (Sakaguchi et cd., 1992), 
with proteins in the N«u> topology requiring a longer hydro- 
phobic stretch and fewer positive charges. Therefore, for type 
ni proteins overlapping signals contributed by both the S/A 
and NHa-terminal domains may act together to assure the 
precise steps in establishing membrane orientation. 
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Summary 

We have tested the role of different charged residues 
flanking the sides of the signal/anchor (S/A) domain of 
a eukaryotic type 11 (NcytC^to) Integral membrane pro- 
tein In determining Its topology. The removal of posi- 
tively charged residues on the N-termlnal side of the 
S/A yields proteins with an inverted topology, while 
the addition of positively charged residues to only the 
Otermtnal side has very little effect on orientation. Ex- 
pression of chimeric proteins composed of domains 
from a type II protein (HN) and the oppositely oriented 
membrane protein Ma Indicates that the HN N-termi- 
nal domain Is sufficient to confer a type It topology 
and that the M2 N-termlnal ectodomaln can direct a 
type II topology when modified by adding positively 
charged residues. These data suggest that eukaryotic 
membrane protein topology Is governed by the pres- 
ence or absence of an N-termlnal signal for retention 
In the cytoplasm that is composed In part of positive 
charges. 

Introduction 

The signals that direct membrane protein topology are 
precise, as it appears that almost all naturally occurring 
membrane proteins adopt only one final orientation, which 
is determined by the amino acid sequence of the polypep- 
tide chain (Bk>bel, 1980). Integral membrane proteins that 
span the lipid bilayer a single time can be classified as 
type I, II. or III (nomenclature of von Heijne. 1988), and this 
is based on the nature of their hydrophobic domains and 
their orientation In membranes. Type I proteins contain an 
N-termlnal cleavable signal sequence that targets the na- 
scent polypeptide to the endoplasmic reticulum (ER) 
membrane (reviewed in Walter and Lingappa. 1986). The 
final NexoCcyt topology of type t proteins is determined by 
cleavage in the ER lumen of the N-terminal signal se- 
quence by signal peptidase (Evans et al., 1986), and their 
translocation across the membrane is halted by a C-ter- 
minal hydrophobic stop-transfer region that anchors the 
polypeptide in the lipid bilayer. Type I proteins constitute 
the major class of integral membrane proteins that span 
the membrane once. The type II proteins do not contain 
a cleavable signal sequence, but Instead have a long 
stretch of hydrophobic residues, the signal/anchor do- 
main (S/A), which serves the dual function of targeting and 
anchoring the polypeptide in the ER membrane with an 
NcytCsxo topology. Examples of type II proteins include 
the transferrin receptor (Schneider et al., 1984). HLA- 



associated Invariant chain (Strubin et al., 1984), asialo- 
glycoproteln receptor (Spiess and Lodish, 1985), and the 
paramyxovirus hemagglutinlrvneuraminldase (HN) and 
SH proteins (Hiebert et al., 1985a, 198Sb). 

The type III proteins contain an internal undeaved S/A 
but adopt the NexoCcyt orientation: the known examples 
constitute a small group including gp74 v-e/£>B of avian 
erythroblastosis virus (Schatzman et al.. 1986), eryth- 
rocyte sialoglycoprotein p (High and Tanner, 1987), cy- 
tochrome P450 (Sato et at., 1990), the influenza A virus 
M2 protein, and the influenza B virus NB protein (Lamb et 
al., 1985; Williams and Lamb. 1986). Recent experimental 
evidence has provided support for the earlier speculation 
(von Heijne and Blomberg, 1979; Inouye and Halegoua, 
1980; Engelman and Steitz. 1981) that the nascent poly- 
peptide chain of type I and II proteins is inserted into the 
ER membrane by a common mechanism involving a hair- 
pin loop structure, and that the final topology of these pro- 
teins is determined by the presence or absence, in type 
I and type II proteins, respectively, of a site in the N-ter- 
minal hydrophobic domain that can be cleaved by signal 
peptidase (LIpp and Dobberstein, 1986a: Shaw et al., 

1988) . Although the type HI proteins, such as the influenza 
virus M2 protein, appear to share the common SRP-me- 
diated ER targeting mechanism found with type I and II 
proteins (Lipp and Dobberstein, 1986b; Hull et al., 1988), 
the detailed steps of their membrane insertion have not 
been characterized. 

We are interested In determining the signals that direct 
the opposing membrane topologies of eukaryotic type II 
and type 111 integral membrane proteins and have used the 
HN and M2 proteins as models. That the hydrophobic na- 
ture of the residues composing an S/A appear to be the 
only structural requirement for this domain to function in 
targeting and anchoring a polypeptide (Zerial et al., 1987) 
and that it has been shown that an S/A domain can be in- 
verted in membranes without loss of function (Parks et al., 

1989) suggest that sequences outside of the S/A of the 
type II and III proteins direct membrane orientation. Analy- 
sis of the sequences of known membrane proteins led to 
the proposal of the "positive inside rule" (von Heijne, 
1986a; von Heijne and Gavel, 1988), in which membrane 
proteins orientate themselves with the most positively 
charged end in the cytoplasm. However, based on a re- 
cent comparison of the sequences of eukaryotic type II 
and HI membrane proteins, a strong correlation between 
the sum of the charges flanking the S/A of a protein and 
its membrane topology has been identified (Hartmann et 
al.. 1989). It was proposed that the net charge of the 15 
residues flanking the two sides of the S/A directs the orien- 
tation of a nascent polypeptide and that the domain with 
the more positive overall charge is retained in the cy- 
toplasm. Thus, this icharge difference" hypothesis pre- 
dicts that it is not the absolute number of positive or nega- 
tive charges flanking the S/A but the sum of the Hanking 
charges that is important for directing the topology of the 
protein (Hartmann et al., 1989). 



Cell 
778 



We report here experiments designed to examine the 
role of charged residues in determining topology. An HN 
cDNA clone was systematically altered by site-specific 
mutagenesis to introduce negatively charged residues 
into the N-termincU flanlcing region and positively charged 
residues into the C-terminal side. Analysis of the topology 
of the altered proteins expressed in CV-1 ceils emphasizes 
the importance of N-terminal positive charges in the es- 
tablishment of the HN topology. From analysis of the 
orientation of various chimeric molecules constructed 
from domains of HN and M2 we suggest that the estab- 
lishment of the type II NcytCexo topology is dependent on 
the presence of an N-terminal cytoplasmic retention sig- 
nal, which is in part composed of positively charged 
residues, and that the opposing HN and M2 orientations 
are governed by the presence or absence of this N-ter- 
minal signal in these two polypeptides. 

Results 

Construction of Charge-Altered HN Mutants 

To determine if a charge difference between the N-ter- 
minal and C-terminai side of the S/A domain is a factor in 
establishing type II membrane topology, the cDNA clone 
of the model type II protein HN was systematically mutated 
by oligonucleotide-dtrected mutagenesis to generate a se- 
ries of charge-altered HN proteins (Figure 1A). In this se- 
ries of mutants, HN residues flanking both sides of the S/A 
domain were changed separately or in combination such 
that the sum of the charges within the N-terminal 15 res- 
idues was progressively more negative than that of the 15 
C-terminal flanking residues. The charge difference rules 
(Hartmann et al., 1989) predict that each of these HN mu- 
tants should adopt an inverted U^Ccyt topology and. be- 
cause the only sites for N-ltnked glycosylation are in the 
C-terminal ectodomain (Hiebert et at., 1985a; Ng et al., 
1990), these inverted molecules should be readily distin- 
guishable from those proteins with the normal HN orienta- 
tion by their lack of glycosylation. 

Expression of Charge-Altered HN Proteins 

To obtain a high level of expression of the mutant HN pro- 
teins, the vaccinia virus system of Fuerst et al. (1986) was 
employed. CV-1 cells infected with vaccinia virus vTF7-3, 
which expresses the bacteriophage T7 RNA polymerase, 
were transfected with plasm id DNAs encoding the mutant 
proteins under control of the T7 RNA polymerase pro- 
moter After radtolat>eting the cells for 1 hr with TranpS] 
label, proteins were immunoprecipitated from cell extracts 
with HN antisera and examined by SDS-potyacrylamide 
gel electrophoresis (SOS-PAGE). Using this expression 
system, wild-type (WT) HN was synthesized as a single 
polypeptide of - 68,000 (Figure IB, lane WT). 

Expression of the HN mutants produced a protein pro- 
file that was significantly different from that of WT HN. The 
charge-altered mutants were synthesized to varying 
degrees as a mixture of two major polypeptides: a species 
with an electrophoretic mobility similar to that of WT HN, 
designated N^yt, and a faster-migrating form (Mr = 50,000, 
Figure 1B. lanes 1-9). designated Naxo- Mmor pplypep- 



tide species migrating faster than the species are 
thought to be degradation products of WT HN as de- 
scribed previously (Ng et al.. 1989). After treatment of the 
proteins with peptide: N-glycosidase F (N-gtycanase), 
each of the mutants was detected as a single polypeptide 
with an electrophoretic mobility similar to that of the Nexo 
protein (not shown), and this suggests that the Ncyi and 
Nexo forms are a single polypeptide species that differ 
from each other by N-linked glycosylation. Further bio- 
chemical evidence that the Ncyt and Nqxo forms of altered 
HN molecules are integral membrane proteins with op- 
posing orientations is presented below. 

Pulse-labeling followed by chase protocols indicated 
that within a 1 hr period all the forms of the mutant HN 
were stable (data not shown), and thus quantitation of the 
amounts of the species that accumulate is a reasonable 
assay for determining the amounts in each orientation. 
Densitometric scanning of autoradiograms from several 
experiments indicated that the fraction of HN mutants 1 
and 2 found in the Ngxo form was 12% and 30%, respec- 
tively (Figure 1A, % Nexo). which suggests that the in- 
troduction of negatively charged residues to the N-terminal 
side of the S/A has an important effect on membrane 
orientation. In contrast, only 5%-6% of the total HN pro- 
tein was synthesized as the Noxo species in the case of 
mutants 3-5. which encode a normal N-terminal domain 
but are modified by the addition of positively charged 
residues to the C-terminal side of the S/A. Combinations 
of N- and C-terminal substitutions (mutants 6-9) had the 
largest effect on HN orientation, as an increasing fraction 
of the total HN protein was synthesized as the Ngxo spe- 
cies when N-terminally altered mutants 1 and 2 were fur- 
ther modified by the addition of positive charges to the 
C-terminal side of the S/A (Figure IB. lanes 6-9). A minor 
species of unknown origin that migrates between the N^yt 
and Nexo forms was immunoprecipitated from cells ex- 
pressing the most highly charge-altered proteins (lanes 
7-9), but its presence does not affect the interpretation of 
the data. The inversion to the Ngxo form reached a maxi- 
mum value of 75% with mutant 9, which encoded N- and 
C-terminal net charges of -2 and +4, respectively. 

These data suggest that the normal HN orientation can 
be disrupted by alterations in charged residues flanking 
the S/A domain, and proteins can be produced that adopt 
more than one orientation. However, our data do not fulfil 
the predictions of the charge difference rules (Hartmann 
et al., 1989), as only prptQlns containing mutations on the 
N-terminal side of the S/A (mutants 1, 2. and 6-9) were sig- 
nificantly inverted in the membrane and the topology of 
the mutants altered only on the C-terminal side of the S/A 
(mutants 3-5) remained largely unaltered. 

Biochemical Evidence for the Orientation 
of Charge-Altered HN Proteins 

It was inferred from the electrophoretic mobility of the 
Nexo protein that the C-terminal domain of these mole- 
cules, which contains the sites for N-linkecJ glycosylation. 
had not been transloQ^rted ^^c.rpss the ER membrane. 
However, it was important to provide evidence that the 
function of the S/A domain had riot been abrogated and 
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Figure 1. Construction and Expression of Charge^ltered HN Proteins 

(A) Schematk: diagram of the charge-aJtered HN proteins. The 17 amino acid residues flanking the N- (left) and C-terminal (right) sides of the S/A 
(cross-hatched box) of WT HN are shown. Solid horizontal lines denote sequence Identity of mutants 1-9 with WT HN, and substHuttons are shown 
betow their position in the HN sequence, a: sum of charged residues within the 15 amino acids flanking the S/A domain; N, N-termina); C. C-termina). 
b: difference in the sum of charged residues on N- and C^ermlnal sides of S/A. c: percentage of the total HN protein accumulated In the un> 
glycosylated Noo form after a 1 hr labeling period. 

(B) Expression of charge-altered HN proteins. CV-i cells infected with vaccinia virus vrF7-3 were transf acted with plasmids encoding V/T HN or mu- 
tants 1-9 and radiolabeled for l hr with T^an[^S]label. Proteins were immunopreclpitated from cefl lysates with HN antlsera and analyzed by 
SOS-PAGE. Ncyi and N«xo denote forms of HN as described in the text. 
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Figure 2. Biochemicat Analysis of Microsomal Membranes from Cells Expressing Charge-Altered HN Proteins 

Vaccinia virus vTF7-3-{nfected cells were transfected with plasmids encoding WT HN or with mutants 2, 6, or 7. Cells were radiolat>eled with 
Tran[^S]label from 3.5-4^ hr posttransfection, and crude microsomal membranes were prepared. 

(A) Alkali fractionation. Microsomal membranes were incubated for 30 min at 4**C with buffer (pM 11) and fractionated by centrlfugation. Equal portions 
of the resuNing pellet (P) or supernatant (S) were neutralized, immunoprecipltated with HN antisera, and the polypeptides were analyzed by 
SDS-PAGE. 

(B) Protease digestion. Samples were treated with buffer (- lanes) or with 20 M^g/ml trypsin (+ lanes). After 45 min at 37^C, microsomal membranes 
were isolated by centrifugatlon, and the proteins were immunoprecipitated with HN antisera before analysis by SDS^PAQE. Ncyi and Nqxo are forms 
of HN as described in the text. 



that these unglycosylated molecules were stably an- 
chored in the membrane (NexoCcyt orientation) and were 
not soluble cytoplasmic proteins. Microsomal membranes 
were prepared from vTF7-3-infected cells that had been 
transfected with plasmids encoding WT HN or mutants 2, 
6, Of 7. and the microsomes were treated with pH 11 buffer. 
Under these conditions, integraJ membrane proteins re- 
main associated with the lipid bilayer and after centrifuga- 
tlon are found in the pellet fraction, while soluble proteins 
are found in the supernatant fraction (Stock and Yu, 1973). 
As shown in Rgure 2A, both the Ncyt and the Naxo protein 
species fractionated like WT HN. as the majority of the 
protein was detected in the pellet fraction (P) and only 
trace amounts were found in the supernatant (S). Thus, 
these data strongly suggest that the function of the S/A do- 
main in targeting the proteins to the ER and anchoring the 
proteins In membranes had not been affected. 

To provide direct biochemical evidence for the topology 
of the mutant proteins, microsomal membranes isolated 
from vaccinia vTF7-3-inf acted cells expressing WT HN or 
several representative mutants were treated with trypsin, 
and the protected protein fragments were analyzed by im- 
munoprecipitation with HN antisera and SDS-PAGE. 
Microsomal membranes from celts expressing ,WT HN or 



mutants 2 and 7 protected the Ncyt species from tryp- 
sin digestion, whereas the N^xo form was accessible to 
added protease (Figure 2B, + lanes). These results sug- 
gest that the Ncyt species has a type II orientation and 
that the vast majority of the Nexo polypeptide chain Is lo- 
cated on the cytoplasmic side of the membrane. 

To provide evidence that the N-terminal domain of the 
HN Nexo species was translocated across the ER mem- 
brane and not held in a loop formation, a site for the addi- 
tion of N-l inked glycosylation was added to the N-terminal 
domain of WT HN anc^two of the charge-altered mutants 
by site-specific mutagenesis (Figure 3). It was anticipated 
that glycosylation of the N-termtnal domain of the Nexo 
species would result in a slower electrophoretic mobility 
than the unglycosylated Naxo protein, while the mobility of 
the Ncyt species would not' be altered. Vaccinia virus 
vrF7-3-infected cells were transfected with plasmids en- 
coding these N-terminal mutants and labeled for 1 hr with 
Tran[^S]label. Proteins were immunoprecipitated from 
celt extracts, incubated with (+) or without (-) N-gtycan- 
ase, and examined by SDS-PAGE. The mutant HN WT* 
contains the new N-ter/ninal for NHinked glycosyla- 
tion. and expression of HN' WT* results in the synthesis 
of a single major polypeptide (Figure 3, WT* lanes). Thus. 
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Rgure a Glycosylation of the Mutant HN N^er- 
minal Domains 

VSacclnia vrF7-3-mfectod CV-t cells were trans- 
fected with plasmld DMAs er>coding derWa- 
tlves of the vyrr and mutant HN proteins altered 
to contain an N-terminaJ glycosytation site ( * ). 
Polypeptides were radiolabeled from 3l5-45 hr 
posttransfection with Tranpsjlabel and immu- 
nopreclpitated with HN antisera. Immune com- 
plexes were divided into two portions, incu- 
bated with (+) or without (-) N-gtycanase, and 
the polypeptides were analyzed by SDS-PAQE. 
The fraction of the total HN protein in the 
orientation is shown (% N^xo)- The N-terminal 
amino acids In the mutants are listed with solid 
horizontal lines. Indicating sequence identity 
with HN WT'. Note that HN WT" contains two 
extra N-termlnal residues (N and T) to create 
the site for N-IInked glycosylation. S/A. HN sig- 
nal/anchor domain. 
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the addition of the two new amino acid residues to form 
the N-terminal glycosylation site did not influence HN 
orientation. Two polypeptide species were identified with 
HN mutants V and 2* (- tanes, Ncyt and Nexo). both of 
which had a slower electrophoretic mobility than the sin- 
gle polypeptide species found after treatment of the pro- 
teins with N-glycanase (+ lanes). The small mobility dif- 
ference of ~5 kd between the Nexo species (- lanes) and 
the deglycosylated protein (+ lanes) suggested that the 
Nexo polypeptides had been modified by the addition of 
ceu'bohydrate and the shift in mobility is consistent with 
the use of the new N-terminal glycosylation site. Further- 
more, the relative abundance of the singly glycosylated V 
and 2* Nexo forms (10% and 30%) correlates well with the 
amount of their unglycosylated HN counterparts seen in 
Figure 1 (12% and 30%). Taken together, these biochemi- 
cal data indicate that the mutant New species represents 
an integral membrane protein with a large C-terminal cyto- 
plasmic region and a small N-terminal domain in the ER 
lumen, and thus these molecules are the result of a bona- 
fide inversion of the HN type II topology. 

Additional HN mutants (Figure 3, 10* -13') were con- 
structed to determine whether an arginine (R) residue 
directly flanking the HN N-terminal side of the S/A was re- 



quired for the establishment of the NcytCexo topology. The 
HN WT* cDNA was attered by mutagenesis such that a 
negatively charged glutamic acid (E), a positively charged 
lysine (K), an uncharged glutamine (Q), or a histidine (H) 
residue, the latter which can be weakly positively charged 
depending on the intracellular pH, replaced the R residue 
that normally flanks the S/A domain. These mutants were 
expressed from plasm ids and analyzed as described 
above for mutants HN WT* , 1* , and 2* . As shown in Fig- 
ure 3, approximately 10% of the mutants containing the E. 
Q, or H substitution were found in the Nexo form* and this 
value was very similar to that obtained with the 1* mutant. 
Expression of the K substitution mutant 11* (lane 11*) 
resulted in a polypeptide mobility pattern that was indistin- 
guishable from that of HN WT* , indicating that a positively 
charged residue directly flanking the N-terminal side of 
the S/A is very important for establishing the HN type II 
topology. 

The HN N-Terminal Domain Directs the Inversion 
of IM2 into a lype il Topology 

The above data suggest that the HN N-terminal domain 
plays a critical role in governing the type II orientation and 
that this region may contain a signal that Is incompatible 
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Rgure 4. Transfer of the HN N-Terminal Do- 
main to Mj Results in a Type II Chimeric Poly- 
peptide 

{A) Expression of Mjg, HgMM, and HgMM.I. 
Vaccinia vTF7-3-lnfected CV-1 cells were trans- 
fected with plasmida encoding Mzg, HgMM, 
or HgMM.I and radtolabeled for 2 hr with 
pssimethionir»e and pS]cysteine. Samples 
were immunoprecipltated from cell extracts 
with M2 anttsera, incubated with (+) or without 
(-) N-gtycanase and analyzed by SOS-PAGE. 
Lane M, influenza A virus-Infected cell poly- 
peptides as a marker: the fastest-migrating 
species Is authentic M2 polypeptide. 
(B) Alkali extraction of membranes from cells 
expressing Mjg or HgMM. Vaccinia virus VTFT- 
infected cells were transfected with plas- 
mlds encoding Mjg or HgMM and were radio- 
labeled for 2 hr with [^Jmethionine and 
cysteine. Crude microsomal membranes were 
prepared, incubated with pH ^^J0 buffer, and 
fractionated by centrifugatton. Equal portions 
of the resulting supernatant (S) or pellet (P) 
fractions were immunopreclpitated with an- 
tisera, and the samples were examined by 
SDS-PAQE. The N-terminal amino acid se- 
quences of the mutants are listed with the 
HgMM.1 N^erminal horizontal line denoting 
sequence identity with HgMM (Hg is Identical 
to HN WT* ). S/A, M2 signalMnchor domain. 
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with its translocation across the mennbrane. A prediction 
of this proposal would be that the type III NexoCcyt -ori- 
ented M2 protein would iack this N-terminal retention sig- 
nal, but that transfer of the HN N-terminal domain to the 
M2 protein should direct an inversion of M2 to the type II 
topology. To test this prediction, a hybrid cDNA molecule 
was constructed (Figure 4. HgMM) such that It encoded 
the HN WT^ N-terminal 19 residues linked precisely to the 
M2 S/A and cytoplasmic domains. The addition of carbo- 
hydrate residues to this chimera would indicate that the 
HN N-terminal domain* which contains the only site for 
N-linked glycosylation, has been translocated across the 
ER membrane. Vaccinia virus vTF7-3-lnf acted cells were 
transfected with plasmids encoding the HgMM chimera or 
Mzg, a modified version of the M2 protein that contains an 
N-terminal site for N-linked glycosylation to facilitate the 
analysis of M2 membrane topology (Parks et al., 1989). 
The cells were labeled with [^SJcystelne and (^S]methi- 
onine for 2 hr, and the proteins were immunopreclpitated 
with M2 -specific antisera, incubated with (-f) or without 
(-) N-glycanase. and examined by SDS-PAGE. 

The M2g protein was synthesized as a major species 
(Figure A, M2g, - lane), which has a slower elec- 



trophoretlc mobility than the N-glycanase-treated protein 
(+ lane); this is consistent with the known NexoCcyt topol- 
ogy of M2 (Lamb et al.. 1985). The Msg protein was ob- 
served to migrate as a doublet; this may reflect differential 
processing of the carbohydrate residues. In contrast, only 
Q% of the HgMM protein was glycosylated and the vast 
majority of the HgMM protein was synthesized as an un- 
glycosylated polypeptide (HgMM. - lane) exhibiting an 
electrophoretic mobility indistinguishable to that of the 
N-glycanase-treated sample (+ lane). Alkali extraction of 
microsomal membranes isolated from celts expressing 
M2g or HgMM showed that both of these polypeptides 
were strongly associated with the membrane, as they 
were found in the pellet fraction tind not in the supernatant 
(Figure 4B). Thus, these data- indicate that the vast ma- 
jority of the chimeric HgMM protein Is orientated as a type 
II protein. Parenthetically, the observed type II topology of 
HgMM differs from that predicted by the charge difference 
rules (Hartniann et al.. 1989), as the sum of the charges 
flanking the HgMM S/A on the N- and C-terminal sides are 
+1 and +1.5, respectively. 

The results obtained with HN'rriutants 10* -13* indicate 
that a positive charge immediately flanking the HN S/A is 



Membrane Protein Orientation Signats 
783 



MgHH 
- + 



MgHH.1 

- + 



MgHH. 2 
- + 



I 



• 



Figure 5. Positive N-Termtnat Charges Convert 
the MgHH Chimeric Protein to a Type It Orien- 
tation 

CV-1 cells infected with vaccinia vrF7-3 were 
transfected with plasmid DNA encoding Mg- 
HH, MgHH.1. or MgHH.2 and radiolabeled for 
1 hr with Tranl'^Sl label. HN proteins were inv 
munoprecipitated from cell extracts with HN 
antisera, incubated with (-f) or without (-) 
N-gtycanase, and the polypeptides were exam- 
ined by SDS-PAGE. The N-terminal amino 
acids in the mutants are listed with hori^ntal 
lines denoting sequence identity with Mgg. 
S/A, HN signal/anchor domain. 
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important for estabtishing a type II orientation. A charge- 
altered form of the HgMM chimera (HgMM.1, Figure 4) 
was constructed to test whether a positive charge flanking 
the N-terminal side of the S/A was also a factor in estab- 
lishing the HgMM NcytCexo topology. The HgMM.1 mu- 
tant, which encoded the same L to E and R to S mutations 
previously analyzed in HN mutant 1* » was expressed and 
analyzed as described above for M2g and HgMM. As 
shown in Figure 4, this charge-altered chimera (HgMM.1, 
- lane) was synthesized as a mixture of a slow-migrating 
glycosylated form and a faster-migrating species with a 
mobility matching that of the N-^lycanase-treated sample 
(+ lane). In contrast to the predominant type II orientation 
of the unaltered HgMM hybrid, approximately 60% of the 
modified HgMM.1 protein was found to be glycosylated 
and thus must be in an NoxoCcyt topology. Thus, these 
data suggest that the HN N-terminal region can direct an 
inversion of the M2 polypeptide from the NexoCcyt topol- 
ogy to that of a type II protein, and this efficient inversion 
is disrupted by the removal of N-terminal positively 
charged residues. 

Ability of the M2 N-Terminal Domain to Direct 
the Type il Topology 

We were interested in determining if the reciprocal experi- 
ment to that described in the section above could be per- 
formed, i.e., to convert a type III domain into a type 
II Neyt domain by the addition of positively charged 
residues. We have previously described the properties of 



a chimeric M2/HN protein containing the M2g N-terminal 
24 residues linlced precisely to the HN S/A and C-terminal 
domains (Parks et al., 1989). This MgHH hybrid is synthe- 
sized as a single polypeptide chain that adopts two oppos- 
ing orientations in membranes, with approximately 60% of 
the protein in the faster-migrating form (Figure 5, 
MgHH panel). Minor faster-migrating species are degra- 
dation products of the HN ectodomain (Parks et at., 1989; 
Ng et al., 1989). The effect of the addition of positively 
charged N-terminal residues on the orientation of this hy- 
brid was examined by constructing two charge-altered 
MgHH mutants. 

In MgHH.1, a single R residue was substituted for the 
M2g N-terminal serine at amino acid residue 23, and 
MgHH.2 encoded a substitution of the two negatively 
charged M2 aspartic acid (D) residues at positions 21 and 
24 with glycine (G) and arginine (R), respectively (Figure 
5). The rationale for the addition of a G residue at position 
21 in MgHH.2 was based on the finding that this was a nat- 
urally occurring change in the N-terminal ectodomain be- 
tween the M2 proteins of the Udorn/72 and PRy8/34 
strains of influenza A virus (Lamb and Lai, 1981; l^mb et 
al., 1985). Thus, it is known that a D to G substitution at 
this position does not alter the M2 protein orientation but 
would contribute generally to the N-terminal charge distri- 
bution. CV-1 cells infected with vaccinia \^F7-3 were trans- 
fected with plasmids encoding the altered MgHH hybrids 
and labeled for 1 hr with TranpSllabel. Proteins were im- 
munoprecipitated with HN antisera, incubated with (+) or 



Cetl 
784 



without (-) N-glycanase, and analyzed by SDS-PAGE. As 
shown In Figure 5, the MgHH.1 mutant was synthesized 
as two major species (Figure 5. panel MgHH.1, - lane) 
that were converted to a single faster-migrating form after 
N-glycanase treatment (+ lane), and 40% of this modified 
chimera was in the Naxo orientation. In contrast, the 
MgHH.2 mutant that contained a positively charged argi- 
nine residue flanking the S/A was predominantly In the 
type II Ncyt orientation, with only 3% of the protein in the 
Nexo topology (Figure 5, panel MgHH.2, - lane). Thus, 
these data indicate that the addition of positively charged 
residues to the Mz N>terminal ectodomain next to the S/A 
domain alters this region such that it can adopt a type II 
topology. 

Discussion 

We wished to test the role of charged residues flanking the 
S/A domain in determining orientation since the biochemi- 
cal mechanism involved in generating the topology of eu- 
karyotic membrane proteins with an internaJ uncleaved 
S/A has not been established. For the purposes of discus- 
sion the boundaries of the S/A domain are defined as the 
first charged residue in both directions from the middle 
of the first hydrophobic domain. The signals directing 
the orientation of proteins in the ER membrane can be 
thought of in simple terms as either acting to promote the 
translocation of the N-terminus of a type III protein across 
the membrane, acting to retain the N-terminus of type II 
proteins in the cytoplasm, or both signals could exist with 
one being dominant. Our data emphasize the importance 
of N-terminal positive charges in generating the type II 
orientation. Removal of positively charged residues from 
the Ncyt domain resulted In some of the HN molecules as- 
suming an inverted orientation in membranes. However, 
as the inversion was not absolute it suggests that the ab- 
sence of a positively charged residue is not the sole factor 
involved in generating the type III orientation. In part, the 
mixed orientation of the chimeric proteins (i.e., MgHH) be- 
fore the charges were altered may reflect difficulties in- 
volved with using chimeric proteins rather than naturally 
existing proteins. Interestingly, the addition of charges to 
the C-terminal side of the HN S/A domain in the absence 
of the N-termlnal positive charge residue resulted in more 
efficient inversion as discussed further below. Previously 
it has been found that the addition of N-terminal positively 
charged residues inverts the type III cytochrome P450 
protein but because of exposure of a cryptic site for cleav- 
age by signal peptidase it becomes a secreted protein 
(Szczesna-Skorupa et al., 1988; Sato et at., 1990). In addi- 
tion, it has been found that by switching domains in chi- 
meric proteins, which leads to alterations in the positions 
of charged residues, membrane topology can be altered 
both in vitro and in vivo (Haeuptle et al., 1989; Parks et al.. 
1989). 

We favor the idea that the N-terminal positively charged 
residue flanking the S/A domain is £in important part of a 
domlnantty' acting retention signal that retains the N-ter- 
minus of the nascent polypeptide chain in the cytoplasm 
to create the type II orientation, and that this retention sig- 



nal in not present in the N-terminal domain of type 111 pro- 
teins. This conclusion Is based on several lines of evi- 
dence in addition to the data obtained with the N-terminal 
charge-altered mutants. First, linking of the HN N-terminal 
domain to the M2 S/A and C-terminal regions produces a 
chimeric protein (HgMM) that largely adopts the HN topol- 
ogy, indicating that the dominant determinant of type 11 to- 
pology had been transferred to M2, and that this HN sig- 
nal could efficiently override any pK)ssible topological 
signals in the M2 S/A smd cytoplasmic domains. Second, 
the M2 N-terminal ectodomain although only 60% effi- 
cient at directing the chimera MgHH into the type III orien- 
tation can be altered to efficiently direct the MgHH chi- 
mera Into the type II orientation when positive charges are 
introduced into the N-terminal S/A flanking positions. This 
suggests that the nature of the sequence that comprises 
a cytoplasmic domain is less critical for generating type 
II topology than the presence of the appropriately posi- 
tioned positively charged residue, and that it is possible 
to create the signal that specifies type II topology. 

These data support the "positive inside** rule proposed 
previously (von Heijne and Gavel, 1988) and for which evi- 
dence has recently been provided in the case of a bac- 
terial membrane protein (Nilsson and von Heijne, 1990) in 
that positive charges are an important factor directing HN 
membrane topology. However, the orientation of the HN 
protein is more sensitive to the removal of N-terminal posi- 
tive charges than to the addition of C-terminal positive 
charges, and this indicates that the topology of eukaryotic 
type 11 proteins is not determined simply by the retention 
of the most positively charged domain. Once the N-ter- 
minal positive charge has been removed, the subsequent 
addition of positive charges to the C-termtnal side of the 
S/A may operate to keep this domain in the cytoplasm (Fig- 
ure 1, mutsmts 6-9). Thus, eukaryotes and prokaryotes 
may share a common mechanism for generating mem- 
brane protein topology by which charged residues provide 
a barrier to translocation, but their mechanisms may differ 
from each other in the relative importance of N-terminal 
positive charges. 

It was originally suggested on theoretical grounds, and 
then supported experimentally, that the signal sequence 
of type I and II proteins is inserted into the ER membrane 
as a hairpin loop with t>oth the N- and C-terminal regions 
located in the cytoplasm (von Heijne and Blomberg, 1979; 
Inouye and Halegoua, 1980; Engelman and Steitz, 1981; 
Shaw et al., 1988). As the jnsertion of type III membrane 
proteins into membranes is dependent on recognition of 
the S/A by the signal recognition particle (Hull et al., 1988). 
the nascent type III chain probably also adopts a loop 
structure. However, after memb/ane insertion as a hairpin 
loop, the critical step in generating the type III topology in- 
volves the translocation of the N-terminal domain across 
the lipid bi layer. The N- to C-terminal polarity of protein 
synthesis implies that the N-terminal region flanking the 
S/A of a nascent polypeptide would be exposed to the 
translocation machinery prior to complete exposure of 
the C-terminal flanking region, andit has. been suggested 
that the transfer of the type Iff NMterminal domain across 
the membrane may odCur very fast relative to the rate of 
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translation (von Heijne. 1986b). In contrast, the presence 
of a positive-charge signal in the N-terminal region of the 
nascent polypeptide chain of type I proteins and the ma- 
ture polypeptide chain of type II proteins imparts cytoplas- 
mic retention of this domain and the C-terminal region is 
translocated. Although the topology of the immature type 
I and mature type II proteins may ultimately be dependent 
on the presence or at>sence of an available site for cleav- 
age by signal peptidase (Lipp and Dobberstetn, 1986a; 
Shaw et al., 1988), what distinguishes them from type III 
proteins is that during synthesis there is retention of the 
N-terminus In the cytoplasm. 

It is not known whether retention of the N-terminal do- 
main of nascent type I and mature type It proteins is due 
to binding by cytoplasmic factors or if a local electrical 
potential across the membrane makes translocation of 
this region thermodynamlcally unfavorable (Weinstein et 
al.. 1982). The translocation of a polypeptide chain into 
the ER could occur, at least in theory, by direct transfer 
through the hydrophobic environment of the lipid bilayer 
(von Heijne and Blomberg, 1979; Engelman and Steitz, 
1981) or through a protein pore in the membrane (Blobel 
and Dobberstein, 1975; Gilmore and Blobel, 1985). but re- 
cent evidence suggests that during translocation the na- 
scent chain is associated with distinct membrane-bound 
proteins (Connolly et al.. 1989; NIcchitta and Blobel, 
1989). In the case of prokaryotes, it has been suggested 
that the Escherichia coll SecA protein directs protein 
translocation by recognizing N-terminal positive charges 
within a signal sequence (Akita et al.. 1990), and it seems 
possible that an analogous protein may operate similarly 
in eukaryotes. The ability to reconstitute membrane trans- 
location in vitro from disrupted microsomes (Nicchitta and 
Blobel, 1990) may provide the means to separate and as- 
sess the rote of individual components of the translocation 
machinery in directing membrane topology. 

Experimental Procedures 

Plasmid Construction and SIte-SpeclfIc Mutagenesis 

cDNA clones that express wild-type SV5 HN (pSVl03HNm, MIebert et 
at., 1985a; Paterson et al., 1985) and Mjg, a derivative of influenza A 
virus M2 containing an N-terminal site for N-linked glycosylation 
(Parks et al., 1989). were used as a source of starting materials for the 
construction of the altered genes. Bacteriophage M13M2g (containing 
the entire Mrg cDNA in the Sam HI site of the repllcative form of 
M13mp19) and M13HN (containing 5' nucleotides 1-306 and encoding 
N-termlna) residues 1-61 from the HN gene) were used as template 
DNA for ollgonuclsotide-directed mutagenesis as described previ- 
ously (Parks et al., 1969). Oligonucleotides were synthesized by the 
Northwestern University Biotechnology Facility on a DNA synthesizer 
(Model 380B, Applied Btosystems Inc., Foster City, CA). 

Mutant HN DNA segments were excised from the replicattve form 
of Ml 3 by EcoRI and PstI digestion, subctoned into a pGem vector, and 
their nucleotide sequence confirmed by didaoxynudeotlde chaln- 
termlnattng sequencing (Sanger et al., 1977). The altered 5' end DNA 
fragments were then reconstructed Into a full-length gene by linkage 
to the HN Pstl-Xhol fragment (encoding residues 82-565) in pQem3 
such that mRNA sense transcripts could be generated using the bacte- 
riophage T7 RNA polymerase promoter and T7 RNA polymerase. 
pGem-HNWT*, which encodes an N-terminal site for N-linked glycosy- 
lation (AsrhAla-Thr), was constructed by the insertion of codons for Asn 
and Thr between HN bases 72-73 and 75-76, respectively. 

The HgMM gene was constructed by introducirtg a new StuI site 
(AGGCCT), which encodes the Junction of the HN N-terminal and M2 



S/A domains (Arg-Pro), into the HN WT* (bases 115-120) and Mj 
(bases 95-100) cONA fragments by oiigonudeotide-directed mutagen- 
esis. Blunt-end ligation of the EcoRI-StuI HN WT' fragment to the 
Stul-PsU M2 fragment In the EcoRI and Pstl sites of pGem3 yielded 
a ONA segment that encoded the HN WT* N-terminal residues 1-19 
linked precisely to the SM aruj C4ermlnal domains (residues 
25-52, Lamb et al., 1985). Slmilarty, HgMM.1 was constructed by blunt- 
end ligation of the HN mutant 1* EooRI-Scal and M2 Stul-PstI frag- 
ments into pQema The construction and characterizatkHi of MgHH 
has been described previously (Parks et al., 1989). MgHH.1 and 
MgHH. 2 were constructed by site^pecific mutagenesis as described 
(Parks et al., 1989). Nucleotide sequences were confirmed by dide* 
oxynudeotide chain-terminating sequencing (Sanger et al„ 1977). 

Cells 

Monolayer cultures of CV-1 cells were grown in Dulbecco^s modified 
Eagle's medium containing 10% fetaf calf serum as described (l-amb 
and Lai, 1982). 

Isotopic Labeling of Polypeptides* Immunoprectpltatlon, 
Paptlde:N-Glycoslda8e F Digestions, and 
Polyacrylamlde (>el Electrophoresis 

cDNA clones were expressed by a modification of the vaccinia vi- 
rusTT; RNA polymerase system of Fuerst et al. (1986). In brief, con- 
fluent monolayers of CV-1 cells (6 cm diameter plates) were Infected 
(multiplicity of Infection ^ 20) with recombinant vaccinia virus vTF7-3. 
which encodes the bacteriophage T7 RNA polymerase. The inoculum 
was removed and calcium phosphate-precipitated piasmid DNA (^vso 
^g) was then added. Cells transfected with plasmlds encoding HN mu- 
tants were radiolabeled from 3l5-4£ hr posttransf action with 20-50 
^Ct/ml Tran(^]label (ICN RadkKhemicals Inc., Irvine. CA) in Dul- 
beccG^s modified Eaglets medium lacking cysteine and methionine. Ra- 
diolabeled cells were washed in phosphate-buffered saline and lysed 
in 1% SDS. tmmunoprecipitation from celt extracts using polyclonal 
rabbit sera to denatured HN (HN antlsera) was as described (Ng et a).. 
1990; Erickson and Blobel. 1979). Cells trcmsfected with plasmlds en- 
coding Msg or the HN/M2 hybrids were redlolabeied from 3.5-5.5 hr 
posttransfection with a mixture of [^S]cysteine and [^Slmethionine 
(125 txCi/ml each), and the proteins were Immunopreclpltated from 
cells solubilized in cokJ RlPA buffer (Lamb et al., 1978) using polyclonal 
sera raised to denatured M2 (DM2 antlsera. I^mb et al., 1965). Treat- 
ment of samples with peptide: N-glycosidase F was as described 
previously (Williams and L^mb, 1986). Samples were analyzed by 
SDS-PAGE on 10% polyacrylamlde gels (HN proteins) or 17.5% gels 
containing 4 M urea (Mag and HN/M^ hybrid proteins), followed by flu- 
orography (Lamb and Choppin, 1976). Densitometrk; scanning of au- 
toradiograms was carried out using an LKB Uttrascan XL laser den- 
sitometer (Pharmacia-LKB, Bromma, Sweden), The %Ncxo values 
reported represent the average of at least two experiments. 

1>yp8ln Digestion and Alkali Extraction 
of Microsomal Membranes 

Vaccinia virus vrF7-3-infected cells were transfected with plasmid 
DNAs and radiolabeled from 3.5-4.5 hr post-DNA transfection with 20 
MCi/ml Tran[^S]lat>el (HN proteins) or from 3 to 5 hr posttransfection 
with 250 (iCi/ml PS]cysteine and [^Slmethlonine (M2g and HN/M2 
proteins) before the preparation of crude microsomal membranes by 
Dounce homogenlzatlon (Adams and Rose, 1985). Samples were ana- 
lyzed by trypsin digestion or alkali fracttonatlon as described previ- 
ously (Parks et al., 1989). 
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ABSTRACT We have developed three computer pro- 
grams for comparisons of protein and DNA sequences. They 
can be used to search sequence data bases, evaluate similarity 
scores, and identify periodic structures based on local se- 
quence similarity. The FASTA program is a more sensitive 
derivative of the FASTP program, which can be used to search 
protein or DNA sequence data bases and can compare a 
protein sequence to a DNA sequence data base by translating 
the DNA data base as it is searched. FASTA Indudes an 
additional step in the calculation of the Initial pairwise simi- 
larity score that allows multiple regions of similarity to be 
Joined to increase the score of related sequences. The RDF2 
program can be used to evaluate the significance of similarity 
scores using a shuffling method that preserves local sequence 
composition. The LFASTA program can display all the re- 
gions of local similarity between two sequences with scores 
greater than a threshold, using the same scoring parameters 
and a similar alignment algorithm; these local similarities can 
be displayed as a "graphic matrix" plot or as individual 
alignments. In addition, these programs have been generalized 
to allow comparison of DNA or protein sequences based on a 
variety of alternative scoring matrices. 



We have been developing tools for the analysis of protein 
and DNA sequence similarity that achieve a balance of 
sensitivity and selectivity on the one hand and speed and 
memory requirements on the other. Three years ago, we 
described the FASTP program for searching amino acid 
sequence data bases (1), which uses a rapid technique for 
finding identities shared between two sequences and exploits 
the biological constraints on molecular evolution. FASTP 
has decreased the time required to search the National 
Biomedical Research Foundation (NBRF) protein sequence 
data base by more than two orders of magnitude and has 
been used by many investigators to find biologically signifi- 
cant similarities to newly sequenced proteins. There is a 
trade-off between sensitivity and selectivity in biological 
sequence comparison: methods that can detect more dis- 
tantly related sequences (increased sensitivity) frequently 
increase the similarity scores of unrelated sequences (de- 
creased selectivity). In this paper we describe a new version 
of FASTP, FASTA, which uses an improved algorithm that 
increases sensitivity with a small loss of selectivity and a 
negligible decrease in speed. We have also developed a 
related program, LFASTA, for local similarity analyses of 
DNA or amino acid sequences. These programs run on 
commonly available microcomputers as well as on larger 
machines. 

METHODS 

The search algorithm we have developed proceeds through 
four steps in determining a score for pair- wise similarity. 
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FASTP and FASTA achieve much of their speed and selec- 
tivity in the first step^ by using a lookup table to locate all 
identities or groups of identities between two DNA or amino 
acid sequences during the first step of the comparison (2). 
The ktup parameter determines how many consecutive iden- 
tities are required in a match. For example, if ktup = 4 for a 
DNA sequence comparison, only those identities that occur 
in a run of four consecutive matches are examined. In the 
first step, the 10 best diagonal regions are found , using a 
simple formula based on the number of ktup matches and the 
distance between the matches without considering shorter 
runs of identities, conservative replacements, insertions, or 
deletions (1, 3). 

In the second step of the comparison, we rescore these 10 
regions using a scoring matrix that allows conservative 
replacements and runs of identities shorter than ktup to 
contribute to the similarity score. For protein sequences, 
this score is usually calculated using the PAM250 matrix (4), 
although scoring matrices based on the minimum number of 
base changes required for a replacement or on an alternative 
measure of similarity can also be used with FASTA. For 
each of these best diagonal regions, a subregion with maxi- 
mal score is identified. We will refer to this region as the 
initial region**; the best initial regions from Fig. lA are 
shown in Fig. 1^. 

The FASTP program uses the single best scoring initial 
region to characterize pair- wise similarity; the initial scores 
are used to rank the library sequences. FASTA goes one 
step further during a library search; it checks to see whether 
several initial regions may be joined together. Given the 
locations of the initial regions, their respective scores, and a 
**joining" penalty (analogous to a gap penalty). FASTA 
calculates an optimal alignment of initial regions as a com- 
bination of compatible regions with maximal score. FASTA 
uses the resulting score to rank the library sequences. We 
limit the degradation of selectivity by including in the 
optimization step only those initial regions whose scores are 
above a threshold. This process caii be seen by comparing 
Fig. IB with Fig. IC. Fig. IB shows the 10 highest scoring 
initial regions after rescoring with the PAM250 matrix; the 
best initial region reported by FASTP is marked with an 
asterisk. Fig. IC shows an optimal subset of initial regions 
that can be joined to form a single alignment. 

In the fourih step of the comparison, the highest scoring 
library sequences are aligned using a modification of the 
optiniization method described by Needleman and Wunsch 
(5) and Smith and Waterman (6). This final comparison 
considers all possible alignments of the query and library 
sequence that fall within a band centered around the highest 
scoring initial region (Fig. ID). With the FASTP program, 
optimization frequently improved the similarity scores of 
related sequences by factors of 2 or 3. Because FASTA 
calculates an initial similarity score based on an optimization 
of initial regions during the library search, the initial score is 
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Fig. 1. Identification of sequence similarities by FASTA. The 
four steps used by the FASTA program to calculate the initial and 
optimal similarity scores between two sequences are shown. (A) 
Identify regions of identity. {B) Scan the regions using a scoring 
matrix and save the best initial regions. Initial regions with scores 
less than the joining threshold (27) are dashed. The asterisk denotes 
the highest scoring region reported by FASTP. (O Optimally join 
initial regions with scores greater than a threshold. The solid lines 
denote regions that are joined to make up the optimized initial score. 
(D) Recalculate an optimized alignment centered around the highest 
scoring initial region. The dotted lines denote the bounds of the 
optimized alignment. The result of this alignment is reported as the 
optimized score. 

much closer to the optimized score for many sequences. In 
fact, unlike FASTP, the FASTA method may yield initial 
scores that are higher than the corresponding optimized 
scores. 

Local Similarity Analyses, Molecular biologists are often 
interested in the detection of similar subsequences within 
longer sequences. In contrast to FASTP and FASTA, which 
report only the one highest scoring alignment between two 
sequences, local sequence comparison tools can identify 
multiple alignments between smaller portions of two se- 
quences. Local similarity searches can clearly show the 
results of gene duplications (see Fig. 2) or repeated struc- 
tural features (see Fig. 3) and are frequently displayed using 
a **graphic matrix'' plot (7), which allows one to detect 
regions of local similarity by eye. Optimal algorithms for 
sensitive local sequence comparison (6, 8, 9) can have 
tremendous computational requirements in time and mem- 
ory, which make them impractical on microcomputers and, 
when comparing longer sequences, on larger machines as 
well. 

The program for detecting local similarities, LFASTA, 
uses the same first two steps for fmding initial regions that 
FASTA uses. However, instead of saving 10 initial regions, 
LFASTA saves all diagonal regions with similarity scores 
greater than a threshold. LFASTA and FASTA also differ in 
the construction of optimized alignments. Instead of focus- 
ing on a single region, LFASTA computes a local alignment 
for each initial region. Thus LFASTA considers all of the 
initial regions shown in Fig. IB, instead of just the diagonal 
shown in Fig. ID. Furthermore, LFASTA considers not 
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only the band around each initial region but also potential 
sequence alignments for some distance before and after the 
initial region. Starting at the end of the initial region, an 
optimization (6) proceeds in the reverse direction until all 
possible alignment scores have gone to zero. The location of 
the maximal local similarity score in the reverse direction is 
then used to start a second optimization that proceeds in the 
forward direction. An optimal path starting from the forward 
maximum is then displayed (5). The local homologies can be 
displayed as sequence alignments (see Fig. 2B) or on a 
two-dimensional graphic matrix style plot (see Figs. 2A and 
3). 

Statistical Significance. The rapid sequence comparison 
algorithms we have developed also provide additional tools 
for evaluating the statistical significance of an alignment. 
There are approximately 5000 protein sequences, with 1.1 
million amino acid residues, in the NBRF protein sequence 
library, and any computer program that searches the library 
by calculating a similarity score for each sequence in the 
library will find a highest scoring sequence, regardless of 
whether the alignment between the query and library se- 
quence is biologically meaningful or not. Accompanying the 
previous version of FASTP was a program for the evaluation 
of statistical significance, RDF, which compares one se- 
quence with randomly permuted versions of the potentially 
related sequence. 

We have written a new version of RDF (RDF2) that has 
several improvements. (0 RDF2 calculates three scores for 
each shuffled sequence: one from the best single initial region 
(as found by FASTP), a second from the joined initial regions 
(used by FASTA), and a third from the optimized diagonal. 
00 RDF2 can be used to evaluate amino acid or DNA 
sequences and allows the user to specify the scoring matrix to 
be employed. Thus sequences found using the PAM250 
scoring matrix can be evaluated using the identity or genetic 
code matrix. (ii'O The user may specify either a global or local 
shufHe routine. 

Locally biased amino acid or nucleotide composition is 
perhaps the most common reason for high similarity scores 
of dubious biological significance (10). High scoring align- 
ments between query and library sequences may be due to 
patches of hydrophobic or charged amino acid residues or to 
A + T- or G + C-rich regions in DNA. A simple Monte Carlo 
shuffle analysis that constructs random sequences by taking 
each residue in one sequence and placing it randomly along 
the length of the new sequence will break up these patches of 
biased composition. As a result, the scores of the shufRed 
sequences may be much lower than those of the unshufHed 
sequence, and the sequences will appear to be related. 
Alternatively, shufHed sequences can be constructed by 
permuting small blocks of 10 or 20 residues so that, while the 
order of the sequence is destroyed, the local composition is 
not. By shuffling the residues within short blocks along the 
sequence, patches of G + C- or A + T-rich regions in DNA, 
for example, are undisturbed. Evaluating significance with a 
local shufHe is more stringent than the global approach, and 
there may be some circumstances in which both should be 
used in conjunction. Whereas two proteins that share a 
common evolutionary ancestor may have clearly significant 
similarity scores using either shuffling strategy, proteins 
related because of secondary structure or hydropathic pro- 
file may have similarity scores whose significance decreases 
dramatically when the results of global and local shuffling 
are compared. 

Implementation. The FASTA/LFASTA package of se- 
quence analysis tools is written in the C programming lan- 
guage and h£Ls been implemented under the Unix, VAX/ 
VMS. and IBM PC DOS operating systems. Versions of the 
program that run on the IBM PC are limited to query se- 
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Table 1. FASTA and FASTP initiaJ scores of the T-cell receptor 
(RWMSAV) versus the NBRF data base 

Initial score 



NRRF rrwfe 


5Sef]iience 


FASTA 


FASTP 


RWHUAV 


T-cell receptor a chain 


155 


98 


KIHURE 


Ig K chain V-I region 


127 


111 


KVMS50 


Ig K chain V region 


149 


62 


KVMSM6 


Ig jc chain precursor V regions 


141 


64 


KVRB29 


Ig K chain V region 


126 


54 


L3HUSH 


Ig A chain V-I 1 1 region 


90 


47 


KVMS41 


Ig K chain precursor V region 


87 


87 


RWMSBV 


T-cell receptor /3-chain precursor 


94 


94 


RWHUVY 


T-cell receptor 0-chain precursor 


91 


59 


RWHUGV 


T-cell receptor y-chain precursor 


87 


61 


RWHUT4 


T-cell surface glycoprotein T4 


86 


63 


RWMSVB 


T-cell receptor y-chain precursor 


71 


41 


HVMS44 


Ig heavy-chain V region 


67 


36 


GIHUDW 


Ig heavy-chain V-II region 


62 


35 



The average FASTP score = 26.1 ± 6.8 (mean ± SD). The 
average FASTA score = 26.2 ±7.2 (mean ± SD). The mean and 
SD were computed excluding scores >54. V, Variable. 



quences of 20(X) residues; library sequences can be any 
length. Copies of the program are available from the authors. 

Although FASTA and LFASTA were designed for protein 
and DNA sequence comparison, they use a general method 
that can be applied to any alphabet with arbitrary match/ 
mismatch scoring values. All the scoring parameters, includ- 
ing match/mismatch values, values for the first residue in a 
gap and subsequent residues in the gap, and other parame- 
ters that control the number of sequences to be saved and 
the histogram intervals, can be specified without changing 
the program. 

EXAMPLES 

Comparison of FASTA with FASTP. To demonstrate the 
superiority of the FASTA method for computing the initial 
score, we compared the protein sequence of a T-cell receptor 
a chain (NBRF code RWMSAV) with all sequences in the 
NBRF protein data base^ and computed initial scores with 
both the present and previous methods. The T-cell receptor is 
a member of the immunoglobulin superfamily; in Release 12.0 
of the data base, this superfamily has 203 members. FASTP 
placed 160 immunoglobulin superfamily sequences in the 200 
top-scoring sequences; 57 related sequences received initial 
scores less than four standard deviations above the mean 
score. FASTA placed 180 superfamily members in the 200 
top-scoring sequences; only 20 related sequences scored 
below four standard deviations above the mean. Table 1 con- 
tains specific examples from this data base search. Although 
there is often little difference in the two methods, this ex- 
ample shows that in a number of cases the new method ob- 
tains significantly higher scores between related sequences. 

Nucleic Acid Data Base Search. FASTA can also be used to 
search DNA sequence data bases, either by comparing a 
DNA query sequence to the DNA library or by comparing an 
amino acid query sequence to the DNA library by translating 
each library DNA sequence in all six possible reading 
frames. We compared the 660-nucleotide rat transforming 
growth factor type a mRNA (GenBank locus RATTGFA) 
with all the mammalian sequences in Release 48 of Gen- 
BankS. We set ktup = 4 (see Methods), and the search was 
completed in under 15 min on an IBM PCAT microcom- 



^Protein Identification Resource (1987) Protein Sequence Database 
(Natl. Biomed. Res. Found.. Washington. DC), Release 12. 

§EMBL/GenBank Genetic Sequence Database (1987) (InteIHgenet- 
ics. Mountain View, CA), Tape Release 48. 



Table 2. DNA data base search of rat transforming growth factor 
(RATTGFA) versus mammalian sequences 



GenBank 






Score 


locus 


Sequence 


Initial 


Optimized 


HI IMTPHAM 


Human TGF mRNA 






HUMTOFA2 


Human Tr>F oen^ ^Rvrin 9^ 






HUMTGFAl 


Human TGF gene (5* end) 


224 


381 


MUSRGEB3 


Mouse 18S-3.8S-28S rRNA 


140 


107 




gene 






MUSRGE52 


Mouse 18S-5.8S-28S rRNA 


140 


107 




gene 






MUSMHDD 


MHC class 1 H-2D 


122 


78 


HUMMETIFl 


Metallothionein (MT)Ip gene 


116 


92 


MUSRGLP 


45S rRNA (5' end) 


115 


83 


HUMPS2 


pS2 mRNA 


105 


106 


MUSCIAII 


a-1 type I procollagen 


86 


89 



The 10 sequences having the highest initial scores are given. TGF, 
transforming growth factor; MHC, msyor histocompatibility com- 
plex. 



puter. The 10 top-scoring library sequences are shown in 
Table 2. Although it can be seen that the 3 top-scoring 
sequences are clearly related to RATTGFA, there are other 
high-scoring sequences that are probably not related, and the 
mouse epidermal growth factor, found in the translated data 
base search (Table 3), is not found among the top-scoring 
sequences. 

To further examine the similarity detected between RAT- 
TGFA and MUSRGEB3. a mouse rRNA gene cluster, we 
used the RDF2 program for Monte Carlo analysis of statis- 
tical significance (the window for local shuffling was set to 10 
bases). Of the 50 shuffled comparisons (data not shown), 1 
obtained an initial score greater than 140 (the observed initial 
score), and 9 shuffled sequences obtained optimized scores 
greater than 107 (the observed optimized score). Therefore, 
the similarity between RATTGFA and MUSRGEB3 is un- 
likely to be significant. 

Translated Nucleic Add Data Base Search. When searching 
for sequences that encode proteins, amino acid sequence 
comparisons are substantially more sensitive than DNA se- 
quence comparisons because one can use scoring matrices 
like the PAM250 matrix that discriminate between conserva- 
tive and nonconservative substitutions. A variant of FASTA, 
TFASTA, can be used to compare a protein sequence to a 
DNA sequence library; it translates the DNA sequences into 
each of six possible reading frames *'on-the-fly." TFASTA 
translates the DNA sequences from beginning to end; it 
includes both intron and exon sequences in the translated 
protein sequence; termination codons are translated into 
unknown (X) amino acids. Table 3 shows the results of a 
translating search of the mammalian sequences in the Gen- 
Bank DNA data base using the RATTGFA protein sequence 
as the query and ktup = 1. In the translated search, the mouse 
epidermal growth factor now obtains an initial score higher 
than any unrelated sequences; however, HUMTGFAl, which 
was found in the DNA data base search but only contains 13 
translated codons, is no longer among the top scoring se- 
quences. 

Local Similarities. Fig. 2 displays the output of a local 
similarity analysis (ktup = 4) of CHPHBAIM, a chimpanzee 
al-globin mRNA, and RABHBAFT, a rabbit a-globin gene, 
including the complete coding sequence and a flanking 
pseudo-^i-globin gene. LFASTA can either display a graphic 
matrix style plot of the local homologies (Fig. 2A) or the 
alignments themselves (Fig. IB). The right-most three align- 
ments (Fig. 2A) match the corresponding regions of the 
mRNA to exon subsequences from the pseudogene. We note 
that the FASTA initial score for the comparison of CHPH- 
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Table 3. Translated DNA data base search of rat transforming growth factor (RATTGFA) versus 
mammalian sequences 



Score 



GenBank 
locus 


Sequence 


Frame 


Initial 


Optimized 


RATTGFA 


Rat TGF type a 


1 


816 


816 


HUMTGFAM 


Human TGF mRNA 


2 


671 


770 


HUMTGFA2 


Human TGF gene 


1 


204 


205 


MUSEGF 


Mouse EOF mRNA 


3 


93 


129 


MUSMHAB3 


Mouse MHC class 11 Hl-IA^ 


1 


91 


58 


MUSIGCD17 


Mouse Ig germ-line DJC region 


3' 


85 


48 


HUMESTR 


Human estrogen receptor 


3 


83 


65 


RATINSl 


Rat insulin 1 {Ins-l) gene 


2 


81 


63 


MUSTHYSl 


Mouse thymidylate synthase 


2 


80 


63 


HUMPNU3 


Human purine nucleoside phosphorylase 


r 


80 


52 



The 10 sequences having the highest initial scores are given. TGF, transforming growth factor; EOF. 
epidermal growth factor; D» diversity; J, joining; C. constant; MHC, major histocompatibility 
complex. 



BAIM and RABHBAPT would be based on the three globin 
gene exons, while the FASTP initial score would be based on 
a single conserved exon. 

The Smith-Waterman optimization used in the LFASTA 
program allows the detection of more subtle features than 
can be detected by the eye using a graphic matrix plot, 
because the path traced is locally optimal, even though it 
may only have a slightly higher density of identities and 
conservative replacements. Fig. 3 shows a plot from a local 
similarity self-comparison of the myosin heavy chain from 
the nematode Caenorhabditis elegans (MWKW) using the 
PAM250 matrix. The amino-terminal half of the molecule 
forms a large globular head without any periodic structure; 
the solid line down the main diagonal represents the ex- 
pected identity of the sequence with itself. The symmetrical 
parallel lines along the carboxyl-terminal half of the mole- 
cule correspond to the 28-residue repeat responsible for the 
a-helical coiled-coil structure of the rod segment. 

DISCUSSION 

In searching a data base, one is attempting to measure 
relatedness; in aligning two homologous sequences, one is 



trying to choose the most likely set of mutations since their 
divergence from a common ancestral sequence. Thus any 
tool for the analysis of sequence similarities must contain 
within it an implicit model of molecular evolution. An 
algorithm that guarantees the optimality of its alignments 
based on a set of scoring rules must be judged on how well 
these rules fit our current understanding of the process of 
molecular evolution. Algorithms that sacrifice realism to 
achieve greater efficiency, regardless of their mathematical 
rigor, require careful empirical evaluation. 

Even though the tools we have developed use rigorous 
algorithms at each step and incorporate a realistic model of 
evolution, their hierarchical nature make them heuristic. The 
original FASTP program has had the benefit of extensive use 
and evaluation by a wide variety of scientists. The FASTA 
program exploits refinements of the previous approach that 
result in a significant improvement in sensitivity. The LFA- 
STA local similarity analysis program is also a logical ex- 
tension of the FASTP approach. 

Because of the trade-offs between sensitivity and selectiv- 
ity in data base searches, the results of any search, and 
particularly those that result in alignment scores that are not 
clearly separated from the distribution of all library sequence 
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CHPHBA GACTCAGAAAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCG 

:::: ::: X ::::::::::::::::: : :: :::::::::::: ::::: : : 
RABHBA GACTGAGAAGGAA-CCACCATGGTGCTGTCTCCCGCTGACAAGACCAACATCAAGACTG 

160 190 200 210 220 

70 80 90 100 110 

CHPHBA CCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGG 
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RABHBA CCTGGGAAAAGATCGGCAGCCACGGTGGCGAGTATGGCGCCGAGGCCGTGGAGAGG 
230 240 250 260 270 280 

Fig. 2. Local comparison of an a-globin mRNA sequence with an a-globin gene cluster. An ape ai-globin mRNA sequence (GenBank 
sequence CHPHBAIM) was compared with a rabbit a-globin gene sequence (RABHBAPT) containing a second pseudo-^globin gene using the 
LFASTA program. {A) A plot of the homologous regions shared by the two sequences. {B) One of the alignments between the mRNA sequence 
and the rabbit a-globin gene (nucleotides 171-855). Three other alignments between the mRNA sequence and the a-globin gene and three 
alignments between the pseudo-9-globin gene (nucleotides 32(X)-3770) were calculated but are not shown. There is 84.3% identity in the 115 
nucleotide overlap. The initial region and optimized scores using LFASTA are 284 and 304, respectively. X denotes the ends of the initial region 
found by LFASTA. 
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Fig. 3. Repeated structure in the 
myosin heavy chain. LFASTA was used 
to compare the Caenorhabditis elegans 
myosin heavy chain protein sequence 
(NBRF code MWKW) with itself using 
the PAM250 scoring matrix. The solid, 
dashed, and dotted lines denote decreas- 
ing similarity scores. The solid lines had 
initial region scores greater than SO and 
optimized local scores greater than 150; 
the longer dashed lines had initial region 
and optimized local scores greater than 
65 and 120, respectively, and the shorter 
dashed lines had initial region and opti- 
mized local scores greater than 50 and 
100, respectively. Homologous regions 
with lower scores are plotted with dots. 



scores, must be carefully evaluated (1, 11). The Monte Carlo 
analysis of statistical significance provided by a program 
such as RDF2 can often be critical in evaluating a borderline 
similarity. Previously we suggested ranges of z values [(ob- 
served score - mean of shufHed scores)/standard deviation 
of shuffled scores] corresponding to approximate signifi- 
cance levels. However the z values determined in a Monte 
Carlo analysis become less useful as the distribution of 
shuffled scores diverges from a normal distribution, as is 
found with PASTA. Therefore, we now focus on the highest 
scores of the shuffled sequences. For example, if in 50 
shuffled comparisons, several random scores are as high or 
higher than the observed score, then the observed similarity 
is not a particularly unlikely event. One can have more 
confidence if in 200 shufHed comparisons, no random score 
approaches the observed score. In general, our experience 
has led us to be conservative in evaluating an observed 
similarity in an unlikely biological context. 

These programs provide a group of sequence analysis 
tools that use a consistent measure for scoring similarity and 
constructing alignments. PASTA, RDF2, and LFASTA all 
use the same scoring matrices and similar alignment algo- 
rithms, so that potentially related library sequences discov- 



ered after the search of a sequence data base can be 
evaluated further from a variety of perspectives. In addition, 
LFASTA can also show alternative alignments between 
sequences with periodic structures or duplications. 
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Introduction 

The most powerful method available today for inferring the 
biological function of a gene (or the protein that it encodes) 
from its sequence is similarity searching on protein and DNA 
sequence databases. With the development of rapid methods, 
for sequence comparison, both with heuristic algorithms and 
powerful parallel computers, discoveries based solely on 
sequence homology have become routine. Indeed, the vast 
majority of the gene identifications in the recent descriptions 
of the Haemophilus influenzjae (Heischmann et aL, 1995), 
Mycoplasma genitalium (Fraser et a/.. 1995), yeast (Dujon. 
1996) and Methanococcus janesscii (Bult et aL, 1996) 
genomes are based only on protein sequence similarity. As 
more complete genomes become available, protein sequence 
comparison wiU become an even more powerful tool for 
understanding biological function. 

Protein sequence comparison is a powerful tool because of 
the enormous amount of information that is preserved 
throughout the evolutionary process. For many protein 
sequences, an evolutionary history can be traced back 1- 
2.5 bUlion years. Proteins that share a common ancestor are 
called homologous. Sequence comparison is most informa- 
tive when it detects homologous proteins. Homologous 
proteins always share a common three-dimensional folding 
structure and they often share common active sites or binding 
domains. Frequently, homologous proteins share conmion 
functions, but sometimes they do not. Our abiUty to 
characterize the biological properties of a protein based on 
sequence data alone stems ahnost exclusively from properties 
conserved through evolutionary time. Predictions of common 
properties for non-homologous proteins — sinularities that 
have arisen by convergence — are much less reliable. 

While sequence similarity searching is a routine method 
for characterizing newly determined DNA and protein 
sequences, researchers sometimes fail to exploit fully the 
information that is available from similarity searches of 
protein sequence databases. This review examines two 
strategies for using similarity search information more 
effectively: (i) looking for alignments that span an entire 
folding domain, rather than a short sequence motif, and (ii) 
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re-examining sequences with high, but not siatisticaUy 
significant, similarity scores. For a broader perspecuve on 
sequence comparison and identification of homologous 
proteins, see Altschul et al. (1994) and Pearson (1996). 

Members of the trypsin-like serine protease superfanuly 
Ctrypsin-like' distinguishes these serine proteases from other 
serine protease families— notably the subtilisins— tiiat use 
serine in the active site but have very different strucmrcs and 
thus are not homologous) provide a classic example of a 
family of proteins with a highly conserved active site. While 
highly conserved motifs from tiiis site arc informative, senne 
proteases share similarity throughout the length of the 
protease domain, not just around the active site residues. 

The trypsin-like serine protease family is quite diverse, 
with a number of very distantiy related homologues. Thus, it 
can be difficult to demonstrate that Streptomyces gnseus 
protease A and protease B are homologous based on sequence 
similarity alone. The second part of this review shows that by 
carefully re-examining sequences with high-scoring, but not 
statistically significant, similarity scores, it is possible to 
identify several proteins that share significant sunilanty with 
both the mammalian trypsin-like serine proteases and their 
distant prokaryotic homologues. 

Motifs, homology, and the serine proteases 

A common misconception in protein sequence comparison is 
that homologous proteins share sequence similanty mostiy 
(or only) near the active site regions or other functional 
domains in a protein. This partiy accounts for the popularity 
of databases of sequence motifs, such as PROSITE (Bairoch, 
1991) which tabulate amino acid patterns tiiat can be used to 
identify most of the members of a protein fanuly . For features 
Uiat result from convergence to a common property, such as 
glycosylation and phosphorylation sites, sequence motifs are 
uniquely mformative. However, for features that result from 
divergence from a common ancestor, such as die serme 
protease active site residues, sequence motifs provide only a 
highly abstracted summary of the sequence conservation in a 
family. Because they share a common tiiree-dimensional 
structure, homologous proteins share sequence siimlarity 
over large rcgions-typically the entire protein fold. 

The trypsin-like serine protease superfamily is a classic 
example of a protein family whose members share several 
simple motifs that arc diagnostic for the family (Figure 1). 
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ID 
AC 
DE 
PA 
NR 
NR 
CC 
CC 

ID 
AC 
DE 
PA 
NR 
NR 
CC 
CC 



TRYPSIN.HIS; PATTERN. 
PS00134; 

Serine proteases, trypsin family, histidine active site 
[LIVM] - [ST] -A- [STAG] -H-C. 

/TOTAL=158(158) ; /POSITIVE=154 (154 ) ; /UNKN0WN=2 (2 ) ; /FALSE POS=2(2)- 
/FAIjSE__NEG=11 (11) ; 

/TAXO-RANGE=??EP?; /MAX-REPEAT=1 ; 
/SITE=5,active_site; . 

TRYPSIN.SER; PATTERN. 
PS00135; 

Serine proteases, trypsin family, serine active site 
G-D-S-G-G. 

/TOTAL=160(160) ; /P0SITIVE=151 (151) ; /UNKN0WN=1 (1) ; /FALSE POS=8(8)- 
/FALSE_NEG=16(16) ; - ovo;, 

/TAXO-RANGE=??EP?; /MAX-REPEAT=1 ; 
/SITE=3 , active_site; 



Fig. 1. Patterns for serine proteases. Patterns from PROSrUE that idenUfy 152/J63 TRYPSIN.HIS or 143/159 TRYPSIN SER members of the trypsin-Iike 
serine protease protein family. ~ -"^ 



Serine proteases cleave peptide bonds using a 'catalytic triad' 
of histidine, serine and aspartic acid that arc required for the 
protease function. Because these residues are so highly 
conserved, patterns that focus on two of the regions (Figure 1} 
can be used to identify every member of the serine protease 
family. (The subtilisin-like serine proteases use exactly the 
same catalytic triad, but the families are non-homologous 
with very different three-dimensional structures.) 

Most members of the trypsin-like serine protease super- 
farafily are readily identified by sequence similarity searching. 
The results from a typical protein database search using the 
Smith- Waterman algorithm (Smith and Waterman, 1981) are 
shown in Figure 2. All of the eukaryotic trypsin-like serine 
proteases share statistically significant similarity with the 
bovine trypsin query sequence. However, as is often the case 
for divergent protein families, some prokaryotic members of 
the family do not share statistically significant similarity with 
bovine trypsin. These sequences are italicized in Figure 2; 
their membership in the serine protease family is usually 
inferred from their common three-dimensional structures 
(Figure 5). 

This absolute conservation of residues in the *catalytic 
triad* might suggest that sequence similarities shared by 
members of this family are limited to those regions. Indeed, 
two of the four *High-Scoring segment Fairs* (Altschul et a/., 
1994) reported by BLAST? correspond to TRYP:„HIS and 
TRYP_SER regions (Figure 3). However, similarity in the 
serine proteases extends from one end of the protein to the 
odier. with conservation throughout the sequence. Indeed, 
many parts of protein are conserved more strongly than the 
region around the aspartic acid in the catalytic triad (Figure 
3). Thus, while the residues in the catalytic triad are an 
essential feature for a functional serine protease, it is the 



serine protease fold (two domains containing anti-parallel 
barrels; Figure 5) that is required to bring diese 
residues together. The evolutionary pressure to conserve the 
tiypsin-like serine protease fold ensures that the folding 
domains share similar amino acids. 

The requirement for a common folded structure in 
homologous proteins usually causes similarities to extend 
from one end of the protein to the other. With the exception of 
mosaic proteins that are the result of recent exon shuffling 
(Doolitde, 1995). c^timal local sequence similarity is rarely 
confined only to a portion of two homologous sequences. (In 
mosaic proteins, the similarity extends throughout the exon- 
shuffled domain.) In general, it is incorrect to speak of 
homology at the N terminus or C tenninus, even though only 
a portion of the protein may be aligned in *High Scoring 
segment Pairs' by BLASTP. Indeed, the length of the locally 
similar region can sometimes be used to distinguish low- 
scoring related sequences from high-scoring unrelated 
sequences. Thus, all but two of the library sequences 
(including four with expectation values >0.02 ) that align 
over >80% of the length of the TRYP„BOVIN query 
sequence are members of the trypsin-like serine protease 
family. Figure 4 displays the locally similar regions for the 
related and unrelated sequences in Figure 2; the highest 
scoring unrelated sequences tend to have relatively short 
(<100 residue) regions of higher similarity (—30% identical), 
while related sequences have longer (140-300 residue) 
aligned regions, sometimes with lower (25% ) sequence 
identity. In general, alignments with longer, lower identity are 
more significant than those with shorter, higher identity. 

The requirement for similarity over a large region is more 
evident when three-dimensional structures are examined. 
TRYP_BOVIN (smicture not shown), TRYP_STRGR 
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LOCUS 

TRYP.BOVIN 

TRY2_PUMAN 

TRYP.PLBPL 

KLK2_flDMAN 

RWAJVIPRU 

TRYl^ANOGA 

TRYAu-DROME 

PA9„PAT 

PLMN_PIG 

TRY5,,AN0GA 

TRYP_FUSOX 

FA7^RABIT 

URTB_PBSRO 

ACRO_PIG 

PRTC_HOMAN 

TRYH-CANFA 

TRYP_STRGR 

HGF_HUMAN 

ACH1_L0NAC 

CERC_SCHMA 

C02_H0MAN 

CFAB_JiOUSE 

PRTZ_BOVIN 



Description 

trypsinogen <EC 3.4.21.4). 

trypsinogen XI 

trypsin 

glandular kallikrein 2 
vipera ruseelli proteinase 
trypsin 1 
trypsin alpha 
coagulation factor XX 
pi asmi nogen 
trypsin 5 
trypsin 

coagulation factor VII 
salivary plasminogen activator p 
aero 8 in 
protein C 

mastocytoma protease 
trypsin 

hepatocyte growth factor prec . 

achelase I protease 

cercarial protease 

complement C2 

complement factor B 

vitamin K-dependent protein Z 



len score E(51,780) 



I^Rl^MOUSE 
GSEP„BACLI 
KRUC_SHBEP 
PRI*A_IiYSEN 
AGI_URTDX 
KCR8_YEAST 
G156_PARPR 
YLK3_CAEEL 
AMY.CIiOAB 
AGI^KORVU 
YB9X-YBAST 
PRTS_MOUSE 
DLK.JIUMAN 
PRTB_STRGR 
PRTA_STRGR 

Fig. 2. Serine protease search — high-scoring 
1996) with TRYP_BOVIN. Only 10% of the 



loricrin. 

glutamyl endopeptidase 

keratin, ultra high- sulfur matrix 

alpha- lytic protease 

lectin/endociiitinase precursor 

prob. serine /threonine-protein kin 

156g surface protein precursor 

putative ser. /thr. -protein kinase 

jmtative alpha-amylase 

root -specific lectin precursor 

hypothetical trp-asp repeats 

vitamin k-dependent protein S 

delta-like protein 

streptogrisin B (S. gris. prot. A) 

streptogrisin A (S. gris. prot. A) 
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scmicnccs. High-scoring sequences from a search of SwissPiot (Baiioch and Bocchnmn 1991; release 33. April 
^tabase seqiinces with E() < 10^ are shown, Trypsin-Uke serine proteases with EQ > 0.02 are in italics. 



(Figure 5, Isbt) and PRTA_STRGR (Isgc) share a very 
similar all-jS fold with symmetrical 0 barrel structures and 
two short a helices. Very little of this structure is directly 
involved in forming the catalytic triad in the active site; yet 
the entire fold is conserved, thus requiring conservation of an 
amino acid sequence that adopts this fold. 

Although almost all vertebrate trypsin-like serine proteases 
share significant sequence similarity with bovine trypsin, 
most bacterial serine proteases do not. For example, the 
similarity score for alignment of bovine trypsin with S,griseus 
protease A is not statistically significant (E() < 64). even 
though the structiues of the two enzymes are very similar 
(Figure 5). Thus, while statistically significant similarity 
generally implies common ancestry, and thus common three- 
dimensional structure [the most conunon exceptions to this 
rule are regions with very low amino acid complexity, 
e.g.YSGGGGSSCGGGYSGGGGSSCGGGSSGGG from 
LORI_MOUSE (Altschul et al., 1^4)], lack of statistically 
significant similarity does not imply non-homology. 

Figure 5 also shows the structures of two non-homologous 



proteins. Subtilisin (Isbt) is included because it is an example 
of ^convergent' evolution (DooUttie, 1994); subtilisin uses tfie 
same triad of catalytic residues (Asp. His and Ser) to cleave 
peptide bonds, but shares no structural similarity beyond the 
geometry of tiie active site of the enzyme. Subtilisin and 
subtilisin-like serine proteases are not homologous to the 
trypsin-like serine proteases. As expected, the different 
structures share no statistically significant sequence siniilarity 
(15(X) random sequences from SwissProt would be expected 
to have a better similarity score than that obtained in the 
trypsin/subtilisin comparison). 

Likewise, high-scoring sequences that are not homologous 
to trypsinrlike serine proteases rarely share structural 
similarity to the famUy. despite their 'strong' similarity. 
Wheat germ agglutinin (7wga) is the most similar non-serine 
protease sequence in the NRL_3D database of sequences 
whose structures are known, yet it does not cotitain a single 0 
sheet. With the exception of membrane-spanning proteins, 
which frequently share hydrophobic regions with other 
unrelated membrane proteins, high sequence similarity— in 
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>>TRYP STRGR TRYPSIN PRECURSOR (EC 3.4.21.4) (SGT) i (259 aa) 

Smith- Waterman score: 410; 34.211% identity in .228 aa overlap 
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TOW tm^'^^^^ER^^'^!^^ of bovine t^psinogcn (TRYP_BOVIN) and S.griseu. trypsin (TRYP.STOGR). Shaded boxes indicate the 
^ri^H^S^xTg ^"^-^ that is the Chin, componem of the catalytic triad. Unshaded boxes indicate the 



the absence of homology— provides no information about 
structural similarity. 

Using statistical significance to explore distant 
relationships 

A major adVance in sequence identification by similarity 
searctiing has been the development of accurate statistical 
estinaates for similarity scores (Altschul et al, 1994). Since 
the similarity score from comparison of TRYPJOVIN and 
TRYP_STRGR has an expectation value of E() < 10~^° we 
conclude that these two sequences share similarity that would 
never be obtained by chance (or obtained once in 10^® 
searches of a database the size of SwissProt). and thus their 
similarity reflects a common ancestry for the two sequences. 
Current versions of the FASTA package of sequence 
comparison programs (version 2 and 3) include accurate 
statistical estimates for both FASTA and SSEARCH (Smith- 
Waterman) similarity scores (Pearson, 1996). Careful analy- 
sis of the high-scoring non-homologous sequences can be 
used both to confirm that the statistical estimates are reliable 
and to explore distantly related members of a protein family. 

Identifying the highest-scoring non-homologous sequences 
in a database search may seem difficult if the protein family 
is very diverse. However, additional searches with high- 
scoring, but possibly unrelated sequences can be used to 
separate high-scoring unrelated sequences from distanUy 
related sequences. Additional searches with high-scoring 



unrelated sequences will typically produce "matches* with 
unrelated sequences, while additional searches with distantly- 
related sequences will produce 'matches' to protein family 
members. If the statistical estimates are accurate, high- 
scoring uiuielated sequences will have E() values of --1.0 , 
since one highest scoring sequence is expected in every 
search. If the E() value for the highest scoring unrelated 
sequences are unexpectedly low and the sequences do not 
contain low-complexity simple sequence repeats, additional 
searches can be carried out with higher gap penalties. 

Bovine trypsin (TRYP„BOVIN) shares statistically sig- 
nificant similarity with every full-length manunalian serine 
protease, but the bacterial alpha-lytic protease (PSLA_LY- 
SEN) or S,gnseus protease A or protease B do not share 
significant similarity with bovine trypsin. There is no 
question that these proteins are homologous to the mamma- 
lian trypsin-like enzymes because of their strong structural 
similarity (Figure 5). However, in the absence of high- 
resolution structural data, how can one decide whether 
a high-scoring, but not significantly similar, sequence is 
homologous? 

Additional searches with the highest scoring, non-signifi- 
cant matches allow us to identify additional members of the 
family. A search with PRTZ_BOVIN. which has a marginally 
significant score, shows strong similarity (E() values < 10"'^ 
with a variety of other members of the family, thus 
confirming its homology. LORI_MOUSE gives a different 
result; while many serine proteases are highly ranked with 
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Identifying distantly related prot^ sequences 
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ng. 4. Serine prt>tease alignments THe alignments of each of the high-scoring sequences reported m Figu« 2 ^ ^f^'LT^ 
TOYP_BOVIN query sequence. TTius, alignment of TRYP.BOVIN with itself extends from the *«^»ns ^o die 

TRYpIbOVIN Mid TRYA.DRQME extends over 85% of the TRYP.BOVIN query sequence. Members of the family with E() > 0.02 are italicized. The 
value and percent identity are also shown. The ssearch -m 4 option was used to produce this figure. 



significant similarity, the sequence alignments contain a 
repeated glycine and serine motif. Thus, LORI_MOUSE is 
not homologous; it contains an unusual simple amino acid 
repeat sequence. On the other hand, GSEP_BACLI shares 
strong similarity with several bacterial serine proteases (E() < 
10"^) and weaker, but significant similarity with TRYP_SA- 
CER and TRYP_FUSOX, Streptdmyces and yeast trypsins 
with very strong similarity to bovine trypsin. GSEP_B ACLI 
is, therefore, a member of the trypsinrlike serine protease 
family. 

A search with alpha-lytic protease reveals a second group 
of closely related serine proteases, which includes S.griseus 
protease A and protease B, While none of the sequences in 



Figure 2 have significant similarity with PRLA_LYSEN. 
GLUP_STRGR, an S.griseus glutaniyl endopeptidase, shares 
strong similarity with the S.griseus protease A and B, alpha- 
lytic protease, and weaker, but significant similarity with 
TRYA_DROME and several other Drosophila serine pro- 
teases (Figure 6). The insect sequences share strong similarity 
to mammalian trypsin-like serine proteases (Figure 2). Thus, 
by carefully exploring sequences with high, but not 
statistically significant, similarity scores, it is possible to 
construct statistically significant links between very distantly 
related serine proteases. 

Distant sequence relationships can thus be established by 
moving from sequence A to significanUy similar sequence B. 
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S. griseus trypsin (Isgt) 
E0<1(r20 34% 228/259 



S. grfseus protease A (1 sgc) 
E()<64 25% 199/297 





Subtilisin (Isbt) 
E0<1500 25% 1132/275 



Aoglutfnin (Twga) 
E0<57 24% 104/171 



Fig. 5. StnxctuTBS — homologous, convei^nt and unrelated. The stnictuies of 
two niembers (Isgi, Isgc) of the trypsin-like serine protease family arc 
shown, along with subtilisin (Isbt) — a noh-tiypsin-iike serine protease — and 
wheat gerai agglutinin (7wga). one of the highest scoring non-scrinc 
proteases in the NRL_3D database (release 20) of seqiieoccs whose structures 
are known. Serine protease structures are aligned to present a similar view of 
the cataljOic site. The expectation values shown are based on a comparison of 
bovine trypsin (TRyP_BOVIN) to the SwissProt (release 33) protein 
sequence database. Also shown are. the percent identity and the length of 
the similar region with respect to the length of the sequence of the structure 
shown. 



and then from B to C, even though A does not share 
significant similarity with C. The strategy is effective because 
of the implicit evolutionary tree that connects all the members 
of a protein family. Thus, in Figure 7, a sequence on a 
relatively short branch, TRYA_DROME, can be used to 
establish significant relationships with very, diverse members 
of the family. For large and diverse protein families, it is 
usually easy to identify a number of *less-divergent' family 
members that can be used to link distant branches of the tree. 
Naturally, such inferences are more reliable if statistically 
significant similarity scores are produced with different sets 
of scoring matrices and gap penalties, and if they are 
established with several different linking sequences. 

A phylogenetic tree was produced from selected vertebrate, 
invertebrate and prokaryotic trypsin-like serine proteases. 
Sequences were aligned using ClustalW (Thompson et al, 
1994) and protein distances estimated and distance trees built 
using the PHYLIP package (Felsenstein. 1989). The three 
numbers to the right , of the sequence names report the 
statistical significance of the alignment score between the 
sequence and bovine trypsin (TRYP_BOVIN), Drosophila 
trypsin A (TRYA^DROME) and ^.gnseus glutamyl 
endopeptidase (GLUP_STRGR), respectively. MPR_BACSU 
is an example of another sequence that links eukaryotic and 
prokaryotic smne proteases, although it does not share ^ 
statistically significant similarity with the three query 
sequences used for expectation values here. 

Summary 

Protein sequence comparison is the most powerful tool 
available today for inferring structure and function from 
sequences because of the constraints of protein evolution — a 
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Fig. 6. From glutamyl endopeptidase to TRYA_DRONfE. 



Identifying distantly related protein sequences 
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0.0/10-30/ 
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10-27/10-17/ 
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TRYP.STRGR 10-20/10-20/ 

10-31/0.0/0.003 



MPR BACSU 44/0.19/2.7 



PRLA_LYSEN 



GLUP.STRGR 



3.1 / 99/^Q•^^ 
87/10-3/0.0 



16/0.5/10-30 



PRTB_STRGR 

PRTA^STRGR 64/0.04/0.0 
Fig. 7. Similarity and homology — a serine protease family tree. 



protein must fold into a functional structure — which are 
reflected in its sequence. Protein sequence similarity can 
routinely be used to infer relationships between proteins that 
last shared a common ancestor 1-2.5 billion years ago. Our 
ability to identify distantly related proteins has improved over 
the past 5 years with the use of optimized scoring parameters 
(Pearson, 1995) and the development of accurate statistical 
estimates. In using sequence similarity to infer homology, 
one should remember the following. 

1. Always compare protein sequences if the genes encode 
proteins. Protein sequence comparison will typically double 
the look-back time over DNA sequence comparison, 

2. Homologous sequences are usually similar over an entire 
sequence or domain. Matches that are > 50% identical in a 
20—40 amino acid region frequently occur by chance. 

3. While most sequences that share statistically significant 
similarity (E() < 0.02 ) are homologous, many distantly 
related homologous sequences do not share significant 
homology. (Significant similarity in low-complexity regions 
does not imply homology.) 

4. By focusing on the statistical significance of a similarity 
and identifying the highest scoring unrelated sequence in a 
database search, you can both confirm that the statistical 
estimates are accurate and potentially identify distantly 
related family members. 

5. Homologous sequences share a conunon ancestor, and thus 
a common protein structure. Depending on the evolutionary 
distance and divergence path, two or more homologous 
sequences may have very few absolutely conserved residues. 
However, if homology has been inferred between A and B, 
between B and C, and between C and D, A and D must be 
homologous, even if they share no significant similarity when 



compared directly. In evaluating the results of a similarity 
search, remember that there is an evolutionary tree that 
connects the family members. 

Motifs revisited 

This review argues that sequence similarity searching, rather 
than motif identification, is the most reliable method for 
identifying distantly related protein sequences. However, 
motif searches are frequentiy used to characterize a newly 
determined sequence. While motifs can be very valuable for 
identifying functional sites in a protein, one must be very 
careful in basing sequence identifications on motif patterns 
alone. Thus, if a newly determined protein sequence contains 
the G-D-S-G-G motif, but does not share strong similarity 
(E() < 20) with any of the hundreds of trypsin-like serine 
proteases in the protein databases, is it likely to be 
homologous to trypsin and share the same proteiii fold? It 
seems unlikely, since so many very distantly related members 
of the fanuly are known. However, if a protein sequence 
shares high, but not significant (0.02 < E() < 20) sequence 
similarity with several distantly related members of the 
family, the presence of the two motifs in Figure 1 would 
provide strong supporting evidence that a new branch in the 
serine protease family had been found. 

Alternatively, if a sequence shares significant similarity 
with proteins from several branches of the serine protease 
family tree, but does not contain the G-D-S-G-G motif, it is 
very likely that it adopts the serine protease protein fold, 
although it may not function as a protease. Thus, when 
enzymatic mechanisms are known, motifs can be used to 
confirm functional aspects of homologous proteins. However, 
in the absence of strong similarity to any member of a large 
protein family, motifs are unreliable for inferring protein 
homology. 
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Abstract 

Structure- based mutational analysis of serine protease specificity has produced a large database of information 
useful in addressing biological function and in establishing a basis for targeted design efforts. Critical issues ex- 
amined include the function of water molecules in providing strength and sjjecificity of binding, the extent to which 
binding subsites are interdependent, and the roles of polypeptide chain flexibility and distal structural elements 
in contributing to specificity profiles. The studies also provide a foundation for exploring why specificity modifi- 
cation can be either straightforward or complex, depending on the particular system. 

Keywords: enzyme kinetics; macromolecular recognition; protein engineering; protein-ligand interactions; pro- 
tein structure; serine protease; site-directed mutagenesis; substrate specificity 



Serine proteases were among the first enzymes to be studied ex- 
tensively (Neurath, 1985). Interest in this family has been main- 
tained in part by an increasing recognition of their involvement 
in a host of physiological processes. In addition to the biologi- 
cal role played by digestive enzymes such as trypsin, serine pro- 
leases also function broadly as regulators through the proteolytic 
activation of precursor proteins (Neurath. 1984; Van de Ven 
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Abbreviations: APPI, amyloid 0-protcin precursor inhibitor domain; 
BAP, Baciiius aica/ophi/us alkaklm^ protease; BLAP, Bacii/us ientus a\- 
kaline protease; BPTI, bovine pancreatic trypsin inhibitor; CMK, chlo- 
romethyl ketone; HNE, human neutrophil elastase; hGH. human growth 
hormone; Nva, norvaline, a linear three-carbon side chain; PAI-I, plas- 
minogen activator inhibitor I; pNA, /7flrra-nitroanilidc; PPE, porcine 
pancreatic elastase; PROK. Thermus album proteinase K; RMCPl and 
RMCPII, rat mast cell proteases 1 and 11; SBPN, Bacilius amyhliqu^a- 
ciens subtilisin BPN'; SCARL. Bad f /us licheniformis subtilisin Carls- 
bcrg; SGPA. Streptomyces griseus protease A; SGPB. S. griseus protease 
B; SGPE; S. griseus proieasc E; SSI, Streptomyces subtilmn inhibitor; 
sue, succinyl; 5uc-FAHY-pNA, tclrapcptide amide substrates varying 
at the PI position; «/c-XAPF-pNA, telrapeptide amide substrates vary- 
ing at the P4 position; THERM, Thermus vulgaris ihermitase; TPA, tis- 
sue plasminogen activator. Nomenclature for the substrate amino acid 

residues is Pw P2, PI, PI'. P2'. . . . , Pn'. where PI -Pi' denotes 

the hydrolyzed bond. Sn, . . . , S2, SI, SI', S2', . . . , Sn' denote the cor- 
responding enzyme binding sites. 



et al.. 1993). Examples of this regulation include the processing 
of trypsinogen by enteropeptidase to produce active trypsin 
(Huber & Bode. 1978) and the cascades of zymogen activation 
that control blood clotting (Davie et al.. 1991). Serine proteases 
have also been recently shown to play essential roles in cell dif- 
ferentiation. For example, the DrosophUa trypsin-like enzymes 
Easter and Snake are important components in the specification 
of ventral and lateral patterns during development (Chasan & 
Anderson. 1989), Asymmetry of cell fates may be the result of 
a protease cascade involving both of these enzymes (Smith & 
DeLotto, 1994). 

An alternative rationale for the continued interest in serine 
proteases has been their emergence as one of the major para- 
digms for the understanding of enzymic rate enhancements and 
of structure-activity relationships. Until recently, all of the 
known enzymes fell into one of two distinct structural classes: 
the chymotrypsin-like and subtilisin-like families (Matthews, 
1977; Fig. lA.B). However, the crystal structure of wheat ser- 
ine carboxypeptidase II (Liao & Remington, 15W; Liao et al., 
1992; Fig. IC) reveals conservation of the essential features of 
the catalytic apparatus within a third distinct protein fold. This 
homodimeric enzyme possesses the a+/3 fold found also in a 
number of other enzymes that share hydrolyiic activity as their 
only common feature (Ollis et al,, 1992). The fold consists of 
an 11-stranded mixed |8-sheet structure surrounded by 15 heli- 
ces, with the active site located at the base of a deep bowl-shaped 
depression in the enzyme surface (Fig. IC). 

The three serine protease classes are distinguished by the ab- 
sence of any conserved secondary and tertiary motifs, but in 
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Kig. I, Diversity of struciural niotifj; in which ihc common catalytic ap- 
piiraius of serine protease is embedded. Shown arc ribbon drawings of 
chyrnoirypsin (A), subiilisin BPN' (B), and wheat serine carboxypcpii- 
dase (C). a-Hclices are shown as red cyhnders and fi^-st rands as yellow 
arrows. Secondary siruciures were determined by the algorithm of 
Kabsch and Sander (1983). Each enzyme possesses two common resi- 
dues of crucial importance lo catalysis: a nucleophilic Ser and an adja- 
cent His, which functions as a general base (shown in white). Enzymes 
are oriented identically by superposition of the backbone atoms and C{S 
of these two amino acids. A third member of the catalytic machinery 
is an aspartate residue (shown at left, also in white) not conserved in po- 
sition relative to ihc Ser and His (compare serine carboxypeptidasc with 
the other two enzymes). Lack of con.servation in position of this resi- 
due suggests that the catalytic apparatus may be better viewed as a jux- 
taposition of Ser-Hlsand His-.Asp dyads, rather than as a single catalytic 
iriad- 
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each case, ihc catalytic serine and histidinc residues inainiain an 
identical geometric orientation (Fig. I). To u lesser. cxieni, ad- 
jacent groups that stabilize the transition state arc also similarly 
arranged (Wright et al.. !969: Roberius ei a!.. 1972a, 1972b; 
Liao et al., 1992). Thus, it appears that nature has arrived at the 
same biochemical mechanism by separate avenues: the chymo- 
trypsin, subtilisin» and serine carboxypeptidasc families of serine 
proteases are a classic example of convergent enzyme evolution 
(Matthews, 1977; Liaoetal., 1992). The resemblance of serine 
carboxypeptidasc to other members of the a//?- hydrolase fold 
family also indicates the operation of divergent evolution within 
this structural framework (Ollis et al.. !992>. Further, a recently 
generated catalytic antibody has been characterized that cata- 
lyzes the stereoselective hydrolysis of norleucine and methionine 
phenyl esters (Guo el al., 1994). The crystal structure of this en- 
zyme reveals the presence of a Ser- His catalytic dyad structur- 
ally similar to ihose of the other serine protease classes (Zhou 
ct al.. 1994). A similar catalytic mechanism is therefore sug- 
gested, indicating ihai the antibody fold may well be a fourth 
struciural framework capable of supporting proteolytic activ- 
ity in a serine protease-like fashion. 

We consider here the structural and kinetic basis for the di- 
versity of substrate specificity in the subtilisin and chymoirvpsin- 
class serine proteases. Emphasis is placed on those systems for 
which both crystallographic and detailed kinetic measurements 
are available, .^fter a brief review of the common mechanism 
of the three classes and the role of mutational analysis in its fur- 
ther elucidation, we concentrate much of our aiieniion on the 
three enzymes subtilisin BPN', «-lytic protease, and trypsin. In 
each case» an cvtensive structure-function analysis has been ap- 
plied to address the roles of particular amino acids in contributing 
to the observed specificity profiles. The wealth of information 
available on the chemical and kinetic mechanisms of catalysis 
and the large data base of homologous sequences provide an es- 
sential fratiicwork that supports these studies. Although the 
functional and/or structural properties of many of the mutant 
proteases can be given a relatively straightforward and objec- 
tive description, there are also many examples where the data 
cannot be easily encapsulated. In these cases, some subjectiv- 
ity in the description of kinetic and structural parameters is un- 
avoidable, and other interpretations of the same data could yield 
different overall conclusions. 

The cutu lytic inechunisni 

The vast majority of early studies on the .serine proteases focused 
on The elucidation of the chemical and kinetic mechanisms of 
catalysis (reviewed by Bender & Killheffer. 1973; Blow, 1976; 
Kraut , 1977; Polgar. 1989). Hydrolysis of ester and amide bonds 
proceeds by an identical acyl transfer mechanism in al! enzymes 
of the subtilisin and trypsin families (Fig. 2A,B.C). Michaelis 
complex formal ion is followed by attack on ihe carbonyl car- 
bon atom of the scissile bond by the eponymous serine of the 
catalytic triad, which is enhanced in nucleophilicity by the pres- 
ence of an adjacent hisiidine functioning as a general base cat- 
alyst. Proton donation by the histidinc to the newly formed 
alcohol or amine group then results in dissociation of the first 
product and concomitant formation of a covalent acyl-enzyme 
complex. The deacylation reaction occurs via the same mecha- 
nistic steps, with ilie attacking nucleophilc provided by a water 
molecule that approaches from the just-vacated leaving group 
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Acvlatlon Ratg-liinltlng nuacylatlon Rate-tlmltino 

•'cat- S '^M - ^^x- '^s[*r~V'] 

Fig. 2. Chemical and kinetic mechanisms of catalysis for serine prote- 
ases. The catalytic groups of trypsin (A) and subiilisin (B) are shown 
interacting with an oligopeptide substrate binding to the Pt-P4 sites. 
(Nomenclature for the substrate amino acid residues is Pn, . . . , P2, P I , 
pi; P2', .... Prt', where PI -PI' denotes the hydrolyzed bond. Sn. - . . , 
S2, Si , Sr, S2', . . . , S/f' denote the corresponding enzyme binding sites 
[Schechier & Berger. 1968].) Note the distinction in residues that form 
the oxyanion hole; in subiilisin » part of the interaction is made by an 
enzyme side chain. The binding site for the oligopeptide also differs; in 
subtilisin it forms the central strand of a three-stranded antiparallel i3- 
sheet. The SI site of trypsin and the SI and S4 sites of subtilisin are the 
major sites where mutagenesis has been used to probe specificity. C: 
Common kinetic mechanism of catalysis for serine proteases indicating 
the meaning of the mechanistic rate constants and their relationship to 
the Michaelis parameters. The correct interpretation of k„f and dif- 
fers depending on the raic-Iimiting step in catalysis, which varies among 
the different enzymes as well as among differing substrates of the same 
enzyme. 



side. Each step proceeds through a tetrahedral intermediate, 
which resembles in structure the high-energy transition state for 
both reactions. This mechanism is capable of accelerating the 
rate of peptide bond hydrolysis by a factor of more than 10' 
relative to the uncaialyzed reaction (Kahne & Still, 1988). 

Extensive structural evidence obtained from X~ray crystallo- 
graphtc and NMR investigations has provided conclusive corrob- 
oration of the essential features of this mechanism (reviewed by 
Sleitz & Shulman» 1982). The investigations have been favored 
by the availability of good ground-state and transition-state sub- 
strate analogs, which have been used to obtain high-resolution 
images of these interactions. The scissile bond of the substrate 
is bound directly adjacent to the Ser-His catalytic couple in all 
the complexes studied. A strong hydrogen bond between these 
two amino acids, necessary to subsequent proton transfer, is 
formed only after substrate is bound. A binding site for the oxy- 
anion of the intermediate is formed by the Gly 193 and Ser 195 
backbone amide nitrogens in the chymotrypsin-like enzymes 
(Fig. 2A), by one amide nitrogen and the Asn 155 side chain in 
the subtilisin family (Fig. 2B), and by the backbone amides of 
Tyr 147 and Gly 53 in the serine carboxypeptidases (Liao et al., 
1992). The interactions made in the S1-S4 enzyme sites (see 
Fig. 2 legend for substrate nomenclature) by the Pl -P4 positions 
of substrate form an antiparallel j3-sheet hydrogen bonding ar- 
rangement in the chymoirypsin and subtilisin families. Because 
the active site of wheat serine carboxy peptidase II does not pos- 
sess similarly exposed peptide backbone groups, it seems likely 
that substrate binding N-terminal to the scissile bond will oc- 
cur in a different fashion in this family (Liao et al., 1992). An- 
other unique structural feature of carboxypeptidase is an 
extensive hydrogen bonding network, which interacts with the 
C-terminal carboxylate of the substrate, essential to its activity 
as an exopeptidase (Mortenson et al., 1994). 

Mutational analysis of both subtilisin and trypsin has con- 
firmed the essential roles of Ser 195 and His 57 in providing rate 
acceleration- Replacement of the catalytic Ser 221 and His 64 
residues of subtilisin with alanine results in decreases of lO'*- 
IO*^-fold in k^„, (Carter & Wells, 1987, 1988). A decrease of 10*- 
fold when the two residues are simultaneously replaced with 
alanine showed that the two catalytic moieties function in a 
highly cooperative manner: mutation of either component re- 
duces activity to a baseline level. Similar results were obtained 
by analogous mutations of Ser 195 and His 57 in rat trypsin 
(Corey & Craik, 1992). This study also showed that enzyme vari- 
ants such as H57K and H57E, which might provide an alterna- 
tive general base, were ineffective, further underscoring the 
importance of the native catalytic triad geometry. These exper- 
iments, as well as others involving replacement of Ser 195 with 
a Cys (Higaki et al., 1989; McGrath et al.. 1989) and engineer- 
ing a metal-actuated activity switch involving His 57 (Higaki 
etal., 1990; McGrath et al., 1993), clarify the role of these active- 
site moieties. The mutational data are in agreement with early 
chemical modification experiments, which also indicated that 
Ser 195 and His 57 play crucial roles in catalysis (Dixon et al., 
1956; Shaw et al.. 1965). 

The residual activity remaining in subtilisin after removal of 
the catalytic moieties was attributed to remaining binding de- 
terminants that stabilized the transition slate complex. One such 
determinant is provided by a hydrogen bonding interaction of 
Asn 155 with the oxyanion intermediate. Mutation of Asn 155 
to a variety of other amino acids resulted in lO^-lO^-fold de- 
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creases in kcat^^m (Bryan et al.. 1986; Wells ei ah. 1986; Carter 
& Wells. 1990). This provides support for the proposals made 
on the basis of crystallographic studies, which suggested that a 
weak hydrogen bond to Asn 155 in the Michaehs complex is 
strengthened in the transition state (Robertus et al., 1972b; 
Poulos et al., 1976). Interestingly, mutation of Thr 220 of sub- 
tilisin showed that it stabilizes the transition state by 2 kcal/mol 
despite the fact that the side-chain O"^ lies 4.0 A from the oxy- 
anion, too far for a direct interaction (Braxton & Wells, 1991). 
One explanation for the influence of Thr 220 was proposed to 
be that dynamic fluauations of the protein structure (Rao et al., 
1987) cause transient direct interactions to occur. An alterna- 
tive suggestion was that the oriented Thr 220 side-chain dipole 
may stabilize the transition state at a distance, by influencing 
the electrostatic potential in the active site. Significant pertur- 
bation of the pKg of the catalytic His 64 results from mutation 
of charged surface residues some 12-20 A distant from the ac- 
tive site (Russell et a!., 1987; Loewenthal el al., 1993). Similar 
mutation of distant charged residues affects the stability of com- 
plex formation with a transition-state analog inhibitor (Jackson 
& Fersht, 1993). These observations support the hypothesis that 
long-range electrostatic interactions may play a small but sig- 
nificant role in stabilizing the catalytic transition state. 

Considerable controversy has surrounded the role of an ad- 
ditional component of the catalytic apparatus, a conserved bur- 
ied aspartate residue first described in the crystal structure of 
chymotrypsin (Matthews et al., 1967; Blow et al., 1969). Mu- 
tation of this residue confirmed its essential role, because all 
variants of trypsin and subtilisin in which the aspartate is ab- 
sent are decreased in catalytic efficiency by at least a factor of 
10" (Craik et al., 1987; Sprang et al., 1987; Carter & Wells, 
1988; Corey & Craik, 1992). The early suggestion of a two- 
proton transfer model, in which the Asp accepts a proton to be- 
come uncharged in the transition state, now appears to be 
unsupported by the bulk of the experimental (Bachovchin & 
Roberts, 1978; Markley, 1979; KossiakoffA Spencer, 1981) as 
well as theoretical (Warshel et al., 1989) evidence. One role for 
the conserved Asp appears to be ground-state stabilization of 
the required tautomer and rotamer of the catalytic His (Craik 
et al., 1987; Sprang et ah, 1987). In addition, because the His 
imidazole ring acquires a proton in the transition state, the Asp 
carboxylate can provide compensation for the developing pos- 
itive charge. Its role may therefore be considered similar to that 
of the hydrogen bond donor groups in the oxyanion hole, which 
compensate the developing negative charge on the substrate car- 
boxy I oxygen atom (Warshel et al., 1989; Fig. 2A.B). Experi- 
mental evidence for the role of electrostatic stabilization of the 
trypsin transition state has been obtained by mutation of the 
conserved Ser 214, which forms a solvent-inaccessible hydrogen 
bond to Asp 102, to various charged and uncharged amino acids 
(McGrath el al., 1992). Decreases in the free energies of catal- 
ysis were in agreement with electrostatic calculations, based on 
crystal structures of the mutants, which predicted these losses 
of activity. 

Comparative analysis of the structures of chymotrypsin, sub- 
tilisin, and serine carboxypeptidase shows that the precise geo- 
metric orientation of the Asp is not conserved relative to the 
Ser-His catalytic diad (Liaoet al., 1992; compare Fig. 1A,B,C). 
In contrast to chymotrypsin and subtilisin, the plane of the Asp 
carboxylate in carboxypeptidase is tilted far out of the plane of 
the His imidazole, such that the His-Asp hydrogen bond is 45° 



out of the carboxylate plane. This geometry is unfavorable for 
proton transfer from His to Asp and provides further evidence 
against the double proton-transfer mechanism. A detailed anal- 
ysis of high-resolution subtilisin structures also showed differ- 
ences in the Asp-His hydrogen bonding relative to trypsin 
(McPhalen & James, 1988). It now appears that the Asp can oc- 
cupy virtually any position relative to the Ser-His diad. There- 
fore, it may be more accurate to regard the operation of the 
serine protease catalytic machinery as two diads — Ser-His and 
His-Asp — that operate in concen, rather than as a single cata- 
lytic triad (Liao et al., 1S>92). In this context, it is of interest to 
note that relocation of the Asp 102 carboxylate group to posi- 
tion 214 in trypsin significantly reconstitutes the activity lost in 
the variants D102S and DI02N (Corey et al.. 1992). The crys- 
tal structure of this mutant shows that Asp 214 still interaas with 
His 57, but in an altered geometric orientation in which the plane 
of the carboxylate is displaced from that of the imidazole ring 
by 40° The relatively high catalytic efficiency of this variant thus 
supports the view of the catalytic apparatus as a juxtaposition 
of two diads. 

Substrate specificity in the subtilisin family 

The catalytic machinery and substrate binding clefts of the 
subtilisin-class serine proteases are embedded in a single-domain 
molecule (Wright et al., 1969; McPhalen & James, 1988). Six 
crystal structures are available in this family: BaciUus amytoli- 
quefacie ns subXiWHn BPN' (Novo) (Wright et al., 1969; McPha- 
len & James, 1988), BaciUus Ucheniformis subtilisin Carlsberg 
(Bode et al., 1986a; McPhalen & James, 1988), Thermus vul- 
garis thermitase (Gros et al., 1989), Thermus album proteinase K 
(Betzel et al., 1988), Bacillus lentus alkaline protease (Betzel 
et al., 1992). and Bacillus alcalophilus alkaline protease (van der 
Laan et al., 1992). Tlie central core of the globular heart-shaped 
molecule is formed by a seven-stranded parallel 0-sheet (Fig. 1 B). 
Nine a-helices are packed against the sheet in a mostly antipar- 
allel fashion relative to the jS-strands; seven of these are on the 
same face and form the larger of two subdomains defined on ei- 
ther side (McPhalen & James, 1988). A two-stranded antiparal- 
lel j3-sheet is also formed in the larger subdomain near the 
C-terminus of the chain. The active site is located in the larger sub- 
domain adjacent to the central j3-sheet; the catalytic Ser 221 is 
found near the amino-terminus of a long of-helix, which follows 
the small antiparallel sheet (Fig. IB; McPhalen & James, 1988; 
numbering system for SBPN is used throughout). 

Nearly all of the secondary structure elements of the enzymes 
are very highly conserved. A central core of 194 amino acids has 
been defined by comparison of the known structures, which con- 
tains nearly all of the conserved a-helices and /3-strands (Siezen 
et al.. 1991). The fungal-derived PROK deviates most signifi- 
cantly in structure but still suF>erimposes these equivalent Ca at- 
oms with RMS deviation of about 0.9 A (the other prokaryotic 
enzymes superimpose at 0.4 A to 0.65 A; Siezen et al., 1991). 
If PROK is omitted, a more extended core of 232 amino acids 
can be defined among the bacterial species of known structure. 
An extensive sequence comparison of 47 subtilisin-class enzymes 
showed a subdivision into two subclasses, based on conserved 
differences in certain parts of the alignment. SBPN, SCARL. 
THERM, BAP. and BLAP are members of subclass I; the struc- 
turally divergent PROK is a representative of subclass II (Siezen 
et al.. 1991). Although the homologous catalytic core of some 
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270 amino acids is found in all subiilisins, some of the enzymes 
possess large insenions in this domain, and many also possess 
C-iemiinai extensions resulting in polypeptide chains as long as 
1 ,775 amino acids. This large database of sequence information 
forms the basis for homology modeling of those enzymes for 
which no tertiar>' structure is available (Siezen et al.. 1991 , 1993). 

Crystal structures of enzyme-inhibitor complexes have iden- 
tified substrate binding determinants extending over nine amino 
acids, from P6 to P3'. The structures include several peptide 
chloromethyl ketone complexes, in which subsires P1-P3 are oc- 
cupied (Robertus ct al., 1972a; Pouios et a!., 1976), as well as 
complexes of SCARL with the protein inhibitor eglin C (Bode 
ei al.. 19S6a; McPhalen & James, 1988), SBPN with eglin C, 
chymotrypsin inhibitor 2 and S/re/;/o//;>'ce5 subtilisin inhibitor, 
(Bode eta!., 1986a; McPhalen & James, !988; Takeuchi el al., 
1991a, 1991b). THERM complexed to eglin C (Gros et al., 
1989), and PROK complexed with peptide inhibitors (Betzel 
et al., 1993). In each of these complexes, the inhibitor chain 
binds in a surface channel of the enzyme, which accommodates 
six residues from P4 to P2' On the N-terminal side of the scis- 
siie bond, the PI-P4 residues of the substrate main chain arc 
invariably inserted between two ^-strands of the enzyme at po- 
sitions 125-127 and 1(X)-102 (Fig. 2B). The substrate thus forms 
ll\e central strand of a i hree-strandcd antiparallel sheet unique 
to the subiilisi ns; in the chymotrypsin-like proteases, this struc- 
ture is not formed because only the strand corresponding to res- 
idues 125-127 is present (Tig. 2A). 

Subtilisins in general show broad substrate specificity profiles 
and often display a preference for large hydrophobic groups at 
position PI (Markland & Smith, 1971). At this position speci- 
ficity arises from a broad open SI binding cleft formed on one 
side by the two ^-strands, which interact with the P I -P4 sub- 
strate residues, and on the other by a loop comprising residues 
155-166 (Fig. 3). This loop varies in size among members of the 
family (Siezen et al., 1991). In SBPN, two different modes of 
binding exist to accommodate either Pl-Phe or Pl-Lys sub- 
strates (Robertus et al., 1972a; Pouios ct a!., 1976). The Phe ring 
binds deeply in the SI cleft near Gly 166, whereas the charged 
Lys extends across the cleft to form a salt bridge with Glu 156. 
A prominent hydrophobic cavity is also present for binding of 
the P4 substrate side chain (Fig. 3). These two sites have been 
the focus of much of the work on substrate specificity. Inter- 
actions made in the more distal sites influence catalytic efficiency 
markedly, and there is evidence for nonadditivity of mutational 
effects suggesting a functional communication between sites 
(Gron & Breddam, 1992). 

Interactions in the SI site 

The most intensively studied member of the subtilisin family is 
SBPN, which has been the subject of extensive protein engineer- 
ing investigations (reviewed in Wells et al., 1987b; Wells & Estell, 
1988). The enzyme efficiently cleaves pepiidyl amide substrates 
possessing a broad range of PI amino acids, with the AV(„/A'/„ 
value showing a linear dependence on the hydrophobicity of the 
substrate side chain. The preference of the enzyme at this position 
is roughly Tyr, Phe > Leu, Met, Lys > His, Ala, Gin, Ser » 
GIu,GIy (Esiellet al., 1986; Wells et al., i 987c). To investigate 
the role of hydrophobicity more closely, 12 different amino acids 
were substituted for Gly 166, which lies at the base of the pocket 
(Fig. 3). Analysis of the mutants showed that an increase in the 



Klg, 3. Struciure of the SI and S4 sites of subiilisin BPN' showing bind- 
ing of a peptide derived from the cocrystal siriicture witli Streptomy- 
cev subtilisin inhibitor. An a-carbon trace of the protein is shown in thin 
blue lines. Catalytic residues arc in yellow, and the inhibitor chain is in 
green with the PI and P4 side chains labeled in blue. Locations of amino 
acids at which the SI and S4 sites have been mutated are indicated in 
red. In rhe subtilisin family, both the SI and .S4 sites are generally spe- 
cific for hydrophobic side chains, but Glu LS(> in the SI site of subti- 
lisin LtPN' provides activity toward P I -Lys side chains as well. At both 
sites, speeitlcity alteration is readily achievable by the .substitution of 
a small niuiiber of residues directly in contact with substrate. Modula- 
tion of the hydrophobic specificity profiles has been achieved at both 
sites, and altered specificity toward charged residues has Ix-en achieved 
in the SI pocket. 



side-chain volume at this position, which consequently decreases 
the size of the SI cleft, caused substantial reductions (up to 
5,000-fold) in A^v/ZC^, toward large PI amino acids. This pre- 
sumably occurs due to sieric repulsion, which predominates over 
the favorable effect of a more hydrophobic pocket. Catalytic 
efficiencies toward small PI side chains were increased by up 
to 10-fold in these variants. An optima! combined volume for 
the SI and PI side chains of 160 A' was estimated from these 
data. It appears that hydrophobicity of the SI site is the main 
driving force for specificity, whereas other effects, such as at- 
tractive van der Waals forces and hydration of polar side chains, 
have a lesser though still significant role. 

Because these studies showed that specificity is easily modu- 
lated by replacing amino acids directly contacting substrate, it 
seemed plausible that more distant portions of the enzyme struc- 
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ture might be of little importance. This idea was further explored 
by a mutational study in which several amino acids from the re- 
lated SCARL enzyme were exchanged for those in SBPN (Wells 
et al., 1987a). Although these two enzymes differ by Sl^^o in se- 
quence, only three substitutions lie within 7 A of the SI pocket. 
Two of these, at positions 156 and 217 (Fig. 3), directly contact 
substrate (residue 217 is in the SI' site). A third residue at posi- 
tion 169 is positioned behind the loop comprising residues 156- 
166» which forms one side of the SI pocket. In SBPN the amino 
acids are Ser 156, Ala 169, and Leu 217; these replaced the anal- 
ogous Glu 156, Gly 169. and Tyr 217 of SCARL. The wild-type 
enzymes differ by factors of 6-60- fold in their kcat^^m values 
toward peptidyl amide substrates possessing PI -Glu, Met, Phe, 
Gin, or Ala; in each case, SBPN is more efficient (Wells et al., 
1987a). 

The triple mutant E156S/G169A/Y217L was found to exhibit 
a substrate specificity profile very similar to that of SCARL. 
Cleavage at each of the PI amino acids tested occurred with ef- 
ficiencies within threefold of the target protease (Wells et al., 
1987a). These data demonstrate that, of the 86 amino acid dif- 
ferences between the two enzymes, three alone are largely suf- 
ficient to determine the differences in specificity. Further, analysis 
of singly and doubly substituted variants showed that the E156S 
mutation is alone almost entirely responsible for the shift in spec- 
ificity profile. Because the activity of the E156S/Y217L enzyme 
was found to be within twofold of the triple mutant, it appears 
PI substrate specificity is in fact locally determined to a signif- 
icant degree. 

The behavior of the El 568 variant is similar to that of other 
mutant SBPN enzymes also possessing electrostatic substitutions 
in the SI site (Table 1; Wells et al., 1987c). Sixteen variants were 
constructed at positions 156 and 166, each of which altered the 
electrostatic potential of the SI site by introducing or remov- 
ing Arg, Lys, Glu, or Asp residues at one or both sites. Analy- 
sis of the mutants showed that increases as high as 10-^-fold in 
kcat^^m toward complementary charged substrates could be 
achieved. To assess the contribution of electrostatic free energy 
to the stabilization of the transition-state complex, parallel sub- 
stitutions of roughly isosteric but uncharged residues (Met re- 
placing Lys; Gin replacing Glu) were also made. For example, 
it was found that increasing the positive charge in the SI site in- 
creases kf-ai/Kffj much more for PI -Glu than for PI -Gin sub- 



Table 1. Engineering electrostatic interactions in subtilisin^ 





Net charge 


PI -Glu 


PI -Lys 


EI56D166 


-2 




16,200 


EI56N166 


-I 


40 


17,800 


EI56Q166 


-I 


16 


12,600 


S156DI66 


-1 


17 


17.400 


EI56G166(wt) 


-I 


35 


39,800 


QIS6G166 


0 


620 


1,070 


Q156N166 


0 


110 


5.600 


E156RI66 


0 


810 


1,550 


Q156K166 


+ 1 


66,000 


1.700 


S156K166 


+ 1 


16,200 


5,400 



Substrate: swc-Ala-Ala-Pro-Glu/Lys-pNA. k^.o,/Km, s~' M~'. 



strates. In this way, substrate binding effects associated solely 
with the charge-charge interaction could be isolated. 

Several of the SI -site specificity variants were also utilized in 
a different study that addressed the ability of SBPN to function 
as a peptide ligase (Abrahmsen et al., 1991). This reaction oc- 
curs when peptides bearing a free amino-terminal group can 
compete effectively with water for attack on the acyl-enzyme in- 
termediate. The intrinsic low level of ligase activity normally 
present in SBPN was enhanced by substitution of the active-site 
Ser 221 by Cys, which shifts the relative preference toward am- 
inolysis by more than lO-^-fold (Nakatsuka et al., 1987). The 
additional mutation P225A improves ligase activity by an ad- 
ditional 10-fold (Abrahmsen et al., 1991). The usefulness of this 
SBPN variant (referred to as subtiligase) for the synthesis of pro- 
teins was improved by introducing specificity variants G166I, 
G166E. and E156Q/G166K into the S221C/P225A framework. 
Preferred ligation of Pl-GIu, Pl-Phe, Pl-Lys. and Pl-Arg es- 
ters was achieved; the specificity for ligation mirrored that for 
cleavage of peptidyl amide substrates (Estell et al., 1986; Wells 
et al.. 1987c). The ability to modulate the SI -site specificity thus 
provides greater Hexibiiity in the choice of ligation junctions. 
Subtiligase has been used to synthesize ribonuclease A and 
active-site variants of this enzyme by stepwise ligation of six es- 
terified peptide fragments 12-30 residues long (Jackson et al., 
1994). 

Substrate-assisted catalysis 

Substrate-assisted catalysis represents a strategy for enhancing 
the specificity of proteolytic cleavage. Subtilisins lacking the cat- 
alytic His 64 can be reconstituted by including a histidine resi- 
due within the substrate (Carter & Wells, 1987; Carter et al.. 
1989, 1991). By placing a His at the P2 position of peptidyl am- 
ide substrates, specificity of up to 400-fold was achieved rela- 
tive to analogous P2-Gln and P2-AIa substrates. The increased 
specificity at position P2 occurs within the context of a compro- 
mised enzyme; H64A subtilisin is reduced 10^-fold in k^^at^^my 
and H64A in the presence of a P2-His substrate remains 5,000- 
fold less efficient than the wild-type enzyme (Carter & Wells, 
1987). Mutation of Ser 221 , Asp 32, and Asn 155 in the context 
of H64A suggested that interactions of the catalytic His with the 
Ser and Asp residues are severely compromised when the His is 
present in the substrate (Carter et al,, 1991). By contrast, the 
oxyanion hole interactions appear much less disrupted. Model- 
building of P2-His substrates indicates that the imidazole ring 
can occupy roughly the same position as that of His 64 in the 
native enzyme, although some deviation in hydrogen bond dis- 
tances and angles exists, which may partially explain the reduced 
activity. 

The large database of SI -site specificity variants was again 
used to enhance the selectivity of proteolytic cleavage by the pro- 
totype H64A enzyme (Carter et al., 1989). For example, an im- 
provement of 20-foId in cleavage of 5wc-FAHY-pNA was 
observed by introducing the SI and ST-site mutations E156S, 
GI69A,and Y217L (Estell etal., 1986; Wells etal., 1987c), which 
increase catalytic efficiency toward Pl-Phe and PI -Tyr sub- 
strates. The additional mutation G166A enhanced specificity for 
Pl-Phe but not PI -Tyr substrates, as expected because the C** 
of Ala 166 appears to cause steric hindrance to the binding of 
the larger Tyr side chain. Little SF>ecificity was observed on the 
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C-ierniinal side of ihe peptide bond in the cleavage of peptide 
substrates. The mutant subiihsins have been shown to selectivelv 
cleave designed target sites in fusion proteins, even under ad- 
verse conditions, making them a usef ul additional tool in the 
repertoire of protein chemists (Carter et al., 1989). 

Iniriher insight into subsiraic-assisied catalysis was provided 
by a novel approach using phage display technology (Matthews 
& Wells, 1993: Fig. 4A). A randomized target substrate sequence 
for an improved H64A subtilisin (Carter ct al., 1989) was in- 
serted between an amino-ierminal affinity domain representing 
a variant of human growth hormone, and the carboxy-terminal 
domain of the M 13 phage gene 111 coal protein. A collection of 
phage particles bearing different substrate sequences is bound 
to immobilized hGH-binding protein and cleaved by subtilisin, 
so that phage bearing good substrate sequences arc elutcd and 
those bearing poor sequences remain bound. Propagation of the 
phage further enriches for efficient or inefficient cleavage sites. 
Analysis of the sequences that were efficiently cleaved revealed 
that Pl'-His as well as P2-His-coniaining substrates could func- 
tion in subsiraic-assisicd catalysis, ,'\nalysis of cleavage of fu- 
sion protein^ linked to alkaline phosphatase, which provides an 
easily assayed activity, suggested that P I'-His-niediaied cleav- 
age was comparable in efficiency to P2-His cleavage. Further 
study of P r-H is cleavage would be informative because release 
of the leaving group after formation of the acyl-enzyme implies 
that no catalytic His is present to assist in deacylaiion. Molec- 
ular modeling has shown that a Pl'-His can also occupy the po- 
sition vacated by His 64 in an H64A variant (Matthews *& Wells, 
1993). 
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Considerable specificity toward substrate residues distant fiom 
the scissile bond e.visis in the subtilisin-class family. A thorough 
mapping of the preferences of two enzymes — SBPN and 
BLAP — shows that the most marked distal interaction occurs 
on the N-ierminal side of the substrate at the S4 enzyme site 
(Oron et al., 1992). Mutational analysis at this position has been 
applied to three of the enzymes of known structure: SBPN (Bder 
et al., 1993; Rheinnecker et al., 1993, 1994), BLAP (Bcch el al., 
1992, 1993; Sorensen ei al., 1993), and BAP (Teplyakov et al., 
1992). The S>4 site is formed from the juxtaposition of two struc- 
tural elements: residues 100-107 at the amino-terminus of an «- 
heli.x in the small subdomain and residues 125-132 in an adjacent 
surface loop. Substrate interactions include both the main-chain 
/J-sheei hydrogen l>onds as well as contacts with the side chains 
of residues 104, 107, 126, and 135, which line the sides and base 
of the site (Pig. 3). Of the amino acids shaping the cleft, only 
Gly 127 is invariant in the family (Siezen et al., 1991). 

In SBPN. the amino acid side chains in the S4 .site are Tyr 104, 
lie 107. and Leu 126, which create a large liydrophobic pocket . 
Accordingly, the substrate prelercnces follow the series Phe > 
Leu. lie, Val > Ala for cleavage of pepiidyl amide substrates 
(Kheinnecker et aL, 1993). Slightly different preferences follow- 
ing the same general trend were observed toward long peptides 
occupying subsites S5-S5' (Gron et al., 1992). However, the 
range ofk\^„/k\„ values varies only over a three- to si.\fold range. 
It was suggested that the small variability might be due to com- 
pensatory shrinkage of the S4 site upon binding of smaller side 
chains (Takeuchi et al., 1991a). Efficiencies toward polar rcsi- 
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H^. 4. Randomixaiion methodologies employed in isolation of serine 
protease substrate specificity muiants. A: "Substrate phage" approach 
applied to .subtilisin. In this method, the sequence of the substrate rather 
than the enzyme is varied to explore the substrate specificity at many 
of the subsites. Uy using H64A subtilisin as the cleaving protease, it was 
discovered thnt substrate-assisted catalysis functions when the substrate 
His is present ai the PI' as well as the P2 position. Note that in phage 
display systems, the phage particle provides a "package" in which the 
nuituni DNA and variant protein are physically linked. This facilitates 
analysis after enrichment of \ hose phage hearing good substrate sequences. 
B: Genetic selection for the isolation of trypsin variants. Periplasmic CN- 
pression of a variant trypsin capable of cleaving the nonnutriiive Arg- 
X substrate (1.2) leads to release of free Arg (3), which enters the 
cytoplasm and relieves au.soirophy. Twenty variant trypsins possessing 
altered Arg/Lys specificity ratios have been isolated in this maimer. C: 
Phage display approach for the isolation of trypsin variants. A wild-type 
trypsin gene fused lo the MI3 gene III coat protein specifically binds 
immobili/cd ecotin, a dimcric protein inhibitor of mammalian serine pro- 
leases thai is found in the bacterial periplasm. 
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dues are decreased by more than 100-fold relative to hydropho- 
bic amino acids (Gr0n et al., 1992). 

Tyr 104. He 107, and Leu 126 were mutated singly and in com- 
bination to amino acids that in every case were smaller than the 
wild-type residue. The following variant enzymes were charac- 
terized kinetically toward amide substrates of the form sue- 
XAPF-pNA: YI04F, Y104A; I107G, 1107A. I107V; L126G. 
L126A. L126V, and the double mutants 1107G/Y104A, 
I107C/L126A. 1107G/L126V(Rheinneckeret al., 1993. 1994). 
These alterations test the effects of enlarging the P4 pocket as 
well as the consequences of deleting a hydrogen bond present 
between the side chains of Tyr 104 and Ser 130. 

It was found that the Tyr 104-Ser 130 hydrogen bond has lit- 
tle effect on enzyme efficiency or specificity: Y104F SBPN hy- 
drolyzes P4-AIa. Val, He, Leu, and Phe substrates nearly 
identically to the wild-type enzyme. The effect of introducing 
Ala at this position is similar to that caused by decreasing the 
size of lie 107: in each case specificity is increased for residues 
possessing large side chains at P4. Among the single mutants at 
positions 104 and 107» the largest improvements in the relative 
specificity for P4-Phe relative to P4-AIa are roughly 200-fold 
for both Y104A and 1107G. For these variants, the effects are 
achieved by maintaining approximately wild-type levels of 
^cut^^m toward Phe and sharply decreasing efficiencies toward 
Ala and the other smaller substrate residues. Mutation of Leu 1 26 
had smaller effects on relative specificities, but large decreases 
in the range of 10-IO'*-fold were observed in kcat^^m* with de- 
creased efficiency correlated with decreasing size of the side 
chain. 

The three double mutants also showed strong preference for 
large side chains at position P4 (Rheinnecker et al., 1994), 
Among these enzymes, the mutant I107G/L126V improves the 
P4-specificity for large side chains to 340-fold relative to P4-Ala, 
but in this case the maximal discrimination was achieved with 
P4-Lcu rather than P4-Phe. The other two double mutants sim- 
ilarly exhibited a maximal preference for P4-Leu. In all cases, 
nonadditivity was observed relative to the single mutants, as ex- 
p)ected from the close proximity of the three side chains. Kinetic 
parameters were also measured toward the single-residue sub- 
strate acetyl-tyrosine ethyl ester, which might be considered as 
a probe measuring the extent to which S4-site mutants affect the 
functioning of the SI site. Large decreases of up to 60-fold were 
observed, with the largest effects occurring for the double mu- 
tants. However, the same variants exhibit comparable efficiencies 
to wild-type when measured toward favored 5wc-XAPF-pNA 
substrates. This suggests that less productive binding may oc- 
cur in the absence of the subsite interactions, particularly be- 
cause the ester substrate is more easily cleaved owing to the 
better leaving group. 

The substrate preference of BLAP at the P4 substrate posi- 
tion is also toward large hydrophobic side chains (Gron et al., 
1992). A broader range of specificities exists than in SBPN: in 
this case, a 24-fold (rather than sixfold) increase in Ar„//A'^/ 
when progressing from small to large hydrophobic amino acids 
is observed. The individual subsite interactions do not affect the 
overall catalytic efficiencies in an additive manner, suggesting 
that functional communication occurs and is mediated by struc- 
tural elements of the protein (Gron & Breddam, 1992). For ex- 
ample, modest substrate preferences at some sites are masked 
if the optimal Pl-Phe and/or P4-Phe residues are present. These 
amino acids dominate the cleavage efficiency such that an up- 



per limit in k^,/K„ is reached even when other subsites are 
filled by nonpreferred residues. These other sites are therefore 
less important when a good substrate rather than a poor sub- 
strate is bound. This study underlines an important principle: 
optimal subsite mapping of subtilisins (and other proteases) 
should be carried out using sets of matched substrates where the 
intcrdependency of binding sites is not manifested. In the case 
of BLAP. the presence of an anthraniloyl group at P5 and a Pro 
at P2 apparently disrupts the Pl-Phe and P4-Phe interactions, 
such that a substrate series containing these nonoptimal groups 
r>ermits distribution of PI' site preferences over a 15-foId range. 
Only a 50*7o difference between the most and least favored PI' 
amino acid is observed in the absence of the nonoptimal groups, 
which prevents accurate mapping of the true subsite preference 
(Gron & Breddam, 1992). 

The structure of the BLAP S4 pocket is similar to that of 
SBPN. The side chains of Val 104, He 107. Leu 126. and Leu 135 
form the base and one side of the pocket, whereas Ser 128, 
Ser 1 30, and Ser 132 are situated along the outside rim with each 
of the side-chain hydroxyl groups pointing inward. The substi- 
tution of Val 104 for the Tyr present in SBPN allows Leu 135 
access to the substrate in BLAP. The only other difference in 
the pocket between the two enzymes is the presence of Gly 128 
rather than Ser 128 in SBPN. A total of 21 mutants in the BLAP 
S4 site have been constructed and analyzed (Bech et al., 1992, 
1993; Sorensen et al., 1993). At position 104 it was found that 
bulky hydrophobic side chains produced enzymes that prefer- 
entially cleaved small hydrophobic side chains, and conversely, 
smaller amino acids increased specificity toward large substrates. 
This behavior is reminiscent of the effects caused by increasing 
the size of residue Gly 166 in the SI site of SBPN (Estell et al., 
1986; see above). Mutations at other positions in the BLAP S4 
site often also showed these effects, but in many cases complex 
specificity profiles not immediately interpretable in simple terms 
were obtained. What does appear clear is that both steric and 
hydrophobic effects play important roles in determining the S4 
specificity profile (Bech et al., 1993; Serensen et al., 1993). For 
some mutants it was further suggested that structural flexibil- 
ity is also critical. 

Distinguishing the degree to which hydrophobicity, steric ex- 
clusion, and substrate-induced conformational changes function 
to determine specificity profiles requires high-resolution struc- 
tural information on the mutant enzymes. Such information has 
begun to be obtained in the study of BAP variants (Teplyakov 
et al., 1992). Substitution of Val 104 in this enzyme with Trp in- 
creased activity toward j«c-AAPF-pNA by 12-foId, The crys- 
tal structure of the uncomplexed variant showed that no other 
structural change occurs and that the S4 site is now blocked off 
such that a modeled P4-AIa residue makes a good van der Waals 
contact with Trp 104. Trp 104 in this variant is oriented nearly 
identically to Trp 104 in THERM, which also exhibits high ac- 
tivity toward 5wc-AAPF-/>NA. 

Comparison of the structures of SSI and a P4-Met to Gly mu- 
tant of SSI complexed to SBPN showed that the S4 site under- 
goes a substantial shrinkage upon binding of P4-Gly (Takeuchi 
et al., 1991b). The structural flexibility in this enzyme raises the 
possibility that a capacity for such rearrangement may exist in 
other members of the family as well. Required for an assessment 
of the degree of flexibility, and the extent to which amino acid 
alterations affect this property, are crystal structures of wild- 
type and mutant enzymes complexed to substrate analogs pos- 
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sessing small and large side chains ai the P4 position. In the case 
of BAP, for example, it would be of interest to determine the 
catalytic efficiencies of the wild-type and V I04W enzymes toward 
larger hydrophobic P4-side chains and then to carry out a sys- 
tematic structural analysis of complexes of each enxyme with 
analogous inhibitors. Such an analysis for the chymotrypsin-like 
u-lytic protease has yielded substantial insight into the structural 
basis for enzyme flexibility (Bone ci al., 1991; see below). 

Together these mutational alterations within the subtilisin Si 
and S4 sites allow two important conclusions: (I) only the lo- 
cal environment of amino acids directly contacting substrate 
need be considered in designing specificity changes; (2) there is 
no important distinction between hydrophobic and ix>Iar enzyme- 
substrate interactions because each type is manipulatable to gen- 
erate new specificity profiles while maintaining high activity. The 
importance of these generalizations to protein design in other 
systems depends upon the extent to which the structural design 
of the binding cleft, and the nature of the reaction being cata- 
lyzed, are crucial parameters. As we shall see, structural con- 
text can have great influence in mediating the extent to which 
specificity alteration is straightforward. A clue to its important 
role can be seen in the dependence of catalytic efficiency on the 
extent to which subsites are filled. The signal thai distal portions 
of substrate are bound is transmitted over large distances and 
must in some way be mediated by the intervening protein struc- 
ture. Long-range effects are key in the chymotrypsin family of 
enzymes, both in terms of filling subsites as well as in determin- 
ing specificity at a single site (Corey ei al., 1992; Hedstrom et al., 
1992, 1994a, 1994b; Perona et al,, 1995; sec below). 

Prohormone convcrtases: Specificity 
toward paired dibasic residues 

Tissue-specific processing of precursor proteins in mammalian 
cells is accomplished by a subfamily of subiilisin-class enzymes 
known as prohormone convcrtases. The need for this cleavage 
event to release bioactivc products provides a crucial regulatory 
step for I he cell. Early protein sequencing studies of various pep- 
tide hormones suggested that tlie dibasic sequences Lys-Lys and 
Lys-Arg provided the sites of cleavage (reviewed by Lazureetal., 
1983), The first protease isolated in this class was the yeast kexin, 
which cleaves with high selectivity both synthetic peptide and 
protein substrates possessing Lys-Arg at the P2 and PI sites, re- 
spectively (Fuller et al., 1989; Brenner & Fuller, 1992). Follow- 
ing isolation of the yeast enzyme a number of mammalian 
species have been cloned including furin (Van den Ouweland 
et al., 1990), PC1/PC3 and PC2 (Smeekens et al,, 1991), and 
more recently the enzymes PC4. PC5, and PACE4 (Rehemtulla 
et al., 1993), The enzymes possess pro-domains and must there- 
fore themselves be processed prior to activation. Maturation has 
been shown to occur in an aut oca la lytic fashion in the cases of 
PC2 (Matthews et ah, 1994) and of furin (Crecmers et al., 1993). 
These studies have now shown thai most cleavage lakes place 
either at Lys-Arg and Arg-Arg dibasic sites, or ai an Arg-X- 
Lys-Arg consensus site, depending on the intracellular pathway 
of localization. 

Mature prohormone convcrtases arc large enzymes that typ- 
ically possess 600-800 amino acids. In addition to the subtiltsin- 
iike catalytic domain, they also variously possess other structural 
elements such as transmembrane anchors, Ser/Thr-rich regions, 
glycosylation sites and Cys-rich regions (Seidah et al., 1991). 



Based on homology modeling, it was predicted that these en- 
zymes possess a greatly increased number of negatively charged 
residues near the substrate binding cleft. Many of these amino 
acids are highly conserved (Siezen et al,, 1991; Fig. 5). Their im- 
portance was tested by site-directed mutagenesis of furin, using 
processing of a peptide hormone in vivo as the functional assay 
(Crcemers et al., 1993). The following residues were mutated: 
Asp 33, Asp61, Glu 101, Asp 104, Glu 107, Glu 129, Asp 130, 
Asp 131 , Asp 165, and Asp 209. Cleavage was assayed toward 
the wild -type hormone precursor as well as toward three mutants 
in which one of the positively charged amino acids in the cleav- 
age site sequence P4-Arg-P3-Ser-P2-Lys-P l-Arg was altered to 
Gly or Ala. The ability of mutants to carry out autoproteolytic 
activation was also assessed. 

Mutation of the P 1 -Arg in this sequence gave rise to prohor- 
mones that could not be processed either by wild-type or by any 
of the mutant furins, suggesting that a basic residue at this po- 
sition is critical io recognition (Creemers et al., 1993). Several 
of the mutants possessed preferences for one of the three mu- 
tant prohormone substrates, implicating the Asp or Glu at that 
enzyme position in recognition of the substrate residue that was 
altered. Thus, Asp 33 is implicated in P2-site binding and Glu 107 
in P4-site binding, in accord with modeling that predicts their 
locations adjacent to these substrate positions (Siezen et aL, 
1991). Mutation of Asp 165, predicted to lie at the base of the 
SI site, abolished activity, as did removal of the negative charge 




I-'iR. S, A di.siinct subcla.ss ol" itie .stibiilisin family of serine proieascs. 
the prohorinoiie convcrtases, are involved in prohormone processing in 
n number of important physiological conic.xis. The specificity of pro- 
cessing is toward sites possessing 2-4 Arg and l^ys residues ai the PI- 
P4 positions. Shown is a solvent -accessible protein surface on which are 
mapped the binding determinants specifying prohormone processing by 
furin. The structure is that of subtilisin BPN' complexed to SSI because 
no three-dimensional structure is yet available in this subclass, A large 
number of negatively charged amino acids is found on the substrate bind- 
ing face of the enzyme (red). The catalytic triad is in blue and the sub- 
strate is in yellow, with the P1-P4 amino acids in green. 
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from positions GIu 129, Asp 130, or Asp 131 putaiively near the 
P4 site. Interestingly, ilie roughly isosieric mutant D209L abol- 
ished activity, despite being located some distance from the bind- 
ing cleft. By contrast, other substitutions nearer to the substrate 
could be introduced without loss of activity. These furin mu- 
tants provide the first mapping of structural determinants af- 
fecting prohormone processing. An obvious need now exists for 
an accurate three-dimensional structure of an enzyme in this 
class. Together with detailed kinetic analysis of synthetic sub- 
strates, this would provide substantial insight itito (he structural 
determinants of this most interesting specificity. 

Substrate specificity in the chyniotrypsin family 

As in the subtilisin family of enzymes, the diversity of substrate 
specificity among the chymoirypsin-like proteases rests upon 
small differences in structure in the substrate-binding cleft. All 
of the chymotrypsin-like enzymes are composed of two juxta- 
posed /5-barrel domains, with the catalytic residues bridging the 
barrels (Fig. lA; Kraut, 1977; Sieitz & Shulrnan, 1982; Bazan 
& Fletterick, 1990). Crystal structures are available for bovine 
chyniotrypsin (Matthews ei al.. 1967), porcine pancreatic elas- 
tase (Watson ei al., 1970), bovine, rat, and Sfrepfottiyces grisens 
trypsins (Ruhlniann ct al., 1973: Sprang el al., 1987; Read & 
James, 1988), rai tonin (Fujinaga & James, 1987), kallikrein 
(Bode ei al., 1983), rat mast cell protease II (Remington et al., 
1988), huiTian neutrophil elastase (Navia et al., 1989). throm- 
bin (Bode el al., 1989a), factor Xa(Padmanabhaneial., 1993), 
and complement factor D (Narajana et aL. 1994), Additionally, 
structures are available for four microbial enzymes: S. griseus 
proteases A, B, and E (SGPA, Delbaere et al., 1979; SGPB, 
Moult eral., 1985; SGPE, Nienaber et aL, 1993), and the Lyso- 
bocter enzymogenes a-lytic protease (Braver et al,, 1979). The 
microbial enzymes share the chymoirypsin-like bilobal j3-barrei 
structure but are more distantly related as evidenced by their 
shorter sequences and substantial structural differences in sur- 
face loops (James. 1976). 5. prisons trypsin, on the other hand, 
is an example of a microbial enzyme that is more homologous 
to mammalian .serine proteases than to its bacteria! counterparts 
(Read & James, 1988). 

Molecular modeling methods have been used to create a 
structure-based sequence alignment of the chymotrypsin-like ser- 
ine proteases (Greer, 1990), which is very useful in assessing sub- 
strate preferences. The specificity is usually most pronounced 
at the S I -sites of the enzymes, where the majority of sequences 
group into one of three subclasses definable by inspection of a 
small number of crucial amino acids. Position 189, located at 
the base of the SI "pocket, is very highly conserved as aii Asp 
in enzymes with trypsin-likc specificity toward Arg- and Lys- 
containing substrates (Fig. 6; chymotrypsin numbering system 
is used throughout — sec Greer, 1990). It is found as a Ser or 
other srnali amino acid in chymotrypsin and elastase-class en- 
zymes, whicli manifest specificity toward aromatic and small hy- 
drophobic amino acids, respectively. The amino acid side chains 
at positions 190 and 228 extend into the base of the pocket as 
well and play an additional role to modulate the specificity pro- 
file. Amino acids at positions 216 and 226 are usually Cly in both 
trypsin and chymotrypsin-like enzymes; larger amino acids at 
these positions partially or fully block access of large substrate 
side chains to the base of the pocket (Fig. 6). Accordingly, elas- 
tases possess larger, usually nonpolar residues at these positions, 
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Fig. 6. Common architecture of the SI site of four members of ihc 
chymotrypsin-like class of serine proteases, with the eponymous Ser 195 
catalytic residue shown in blue. An early paradigm for substrate specific- 
ity was derived from a comparison of the SI -site structures of trypsin 
(A), chymotrypsin (H). and pancreatic clastasf (C). Amino acids at po- 
.siiions 216 and 226 (lel'i side of the pocket) and at 189 and 190 (right 
>idc) are indicated by van der Waals surfaces colored white for uncharged 
and red for ncgaitvcly charged residues. The shape and electrostatic 
character of each site corroborate the specificities toward Arg/Lys, 
Phe/Tyr/Trp, and Ala, respectively. Fiddler crab collagenase (D) pos- 
sesses a negatively charged Asp in an altered position relative lo tryp- 
sin. Although it might be predicted that this enzyme possesses a 
irypsin-hke specificity profile, it is instead capable of efficiently cleav- 
ing I -side chains of substrates specific to each of the three other pro- 
teases. Amino acid sequence alignment of these four enzymes (E) 
showing the distinction in primary specificity residues (bold) and sec- 
ondary determinants (underlined). Positions in the sequence of two ad- 
jacent surface loops are also shown (see Figs. 7, 1 1, 13). 



providing a platform for interaction with small hydrophobic 
substrate P! -amino acids. The shapes of the Si pockets of tryp- 
sin, chymotrypsin, and elastase thus appear to readily explain 
the observed specificities, leading to the canonical view that sub- 
strate preferences are in fact determined by this limited set of 
amino acids (Stroud, 1974). However, as discussed below, this 
perspective has now been shown to be incorrect by the discov- 
ery that other structural elements distant from the substrate 
binding site are also crucial determinants of specificity. 

Kinetic measurements of substrate preferences for the two 
mammalian elastases of known structure (PPE and HNE) per- 
mit a more detailed appraisal of structure-function relationships 
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(Bode el al., 1989b). Both enzymes possess bowl-shaped hydro- 
phobic SI binding sites that accommodate small hydrophobic 
substrates (Watson et al., 1970; Navia et al., 1989). However, 
the SI site of PPE has been described as slightly less hydropho- 
bic and marginally smaller than that of HNE (Bode et al.» 
1989b). PPE cleaves peptide bonds preferentially at small Pl- 
Ala and Nva side chains (Harper et al.. 1984)» whereas HNE 
manifests substantial activity toward the branched-chain Val, He, 
and Leu residues (Harper et al., 1984; Stein et al.. 1987). These 
preferences are in accord with the smaller SI site of PPE, but 
the small difference in size is insufficient to account for the al- 
tered profiles. The identity of the amino acids that line the SI 
pockets differ substantially in the two enzymes, most notably 
by the presence of the charged Asp 226 in HNE, which is present 
as a Thr in PPE. In HNE, Asp 226 is buried by Val 216 and 
Val 190, and the carboxylate group points away from substrate 
into a network of buried water molecules (Navia et al., 1989). 
One possible explanation for the superior ability of HNE to 
cleave branched-chain substrates could thus be that the SI -site 
possesses greater intrinsic flexibility as a consequence of its dif- 
ferent construction and interaction with surrounding portions 
of the structure (Bode et aL, 1989b). A small shrinkage of the 
SI site is in fact observed upon binding Val relative to Leu in 
this position (Bode et aL, 1986b; Wei et al., 1988). 

Cleavage of peptide substrates adjacent to the acidic Asp and 
Glu residues is the hallmark of an additional subclass of en- 
zymes. Recognition of the negatively charged carboxylate is 
accomplished by means of a His residue at position 213 in a 
number of microbial enzymes including the Staphyiococcus au- 
reus V8 protease (Drapeau. 1978), SGPE (Svendsen et al., 1991), 
and two epidermolytic toxins of S. aureus (Dancer et al., 1990). 
Recently, the crystal structure of SGPE complexed with the tct- 
rapeptide Ala-Ala-Pro-GIu has been determined at 2.0 A reso- 
lution (Nienaber et al., 1993). The structure reveals that the Glu 
carboxylate is indeed bound directly by His 213 as well as by the 
side chains of Ser 192 and Ser 216. The structure of the enzyme 
also shows that His 213 is hydrogen bonded in series to two other 
His residues at positions 199 and 228 to form a solvent-inaccess- 
ible His triad that penetrates through the core of the enzyme. 
This remarkable structural feature is postulated to play a role 
in substrate charge compensation, by delocalizing the substrate 
negative charge through proton transfer across the His residues 
(Nienaber et al., 1993). No other serine protease is known to pos- 
sess the His triad. An alternative to the use of His 213 is found 
in a protease from cytotoxic T-lymphocytes, which possesses an 
Arg at position 226 (Murphy et al., 1988). This enzyme is un- 
usual in its preference for cleavage at Asp rather than Glu resi- 
dues (Odake et al., 1991). Mutation of Arg 226 to Gly. followed 
by qualitative assay of crude lysates in which the variant was ex- 
pressed, showed lowered activity toward peptidyl PI -Asp thio- 
benzyl ester substrates and increased activity toward analogous 
Pl-Phe substrates (Caputo et al., 1994). 

Virtually all chymotrypsin-like serine proteases share a com- 
mon feature: an SI -site specificity that is restricted to a relatively 
narrow subset of the naturally occurring amino acids. It there- 
fore came as some surprise when one enzyme, the coUagenolytic 
serine protease 1 from the fiddler crab Uca pugi/ator, was shown 
to possess high catalytic activity toward each of trypsin, chymo- 
trypsin, and etastase-like substrates (Grant & Eisen, 1980). The 
specificity profile of this enzyme has recently been reexamined 
in detail (Tsu et al., 1994). Crab collagenase exhibits 5% of clas- 



tase. 10*Vb of chymotrypsin. and 65% of trypsin activity, as as- 
sessed by kcaf/Kfti values toward peptidyl amide substrates 
possessing Ala, Phe, and Arg, respectively, at the PI position. 
^cai values toward each of these amino acids are extremely 
high. Additionally, it is the most efficient chymotrypsin-like 
enzyme known toward PI -Leu and PI -Gin amide substrates, 
manifesting 6-foId and 50-fold greater activities than does chy- 
motrypsin toward these substrates (Tsu et al., 1994). Therefore, 
the chymotrypsin-like scaffold can maintain an SI binding 
pocket that accommodates a very broad range of amino acids 
without sacrificing catalytic efficiency. 

Crab collagenase exhibits an interesting rearrangement of a 
negative charge at the base of the SI site: residues Asp 189 and 
Gly 226 of trypsin are altered to Gly 189 and Asp 226 in colla- 
genase (Grant et al., 1980; Fig. 6). However, this predicts a stria 
specificity for Pl-Lys and Arg substrates: the amino acids at po- 
sitions 190 and 216 are Thr and Gly, respectively, which allows 
access of the substrate to Asp 226. As discussed above. Asp 226 
of human neutrophil elastase is buried by Val 216, leading to a 
hydrophobic specificity profile (Navia et aL, 1989). A possible 
explanation for the ability of crab collagenase to accommodate 
hydrophobic as well as positively charged substrate residues is 
provided by a recently refined 2.5-A crystal structure of the en- 
zyme complexed with the dimeric serine protease inhibitor ecotin 
(J.J. Perona, C.A. Tsu, C.S. Craik, & R.J. Fletterick, submit- 
ted for publication). The structure shows that one carboxylate 
oxygen of Asp 226 is accessible to substrate, but that the Pl- 
methionine residue of ecotin does not enter the SI -site and binds 
instead on the surface of the enzyme adjacent to the disulfide 
bond at positions 191-220. Modeling shows that the pocket can 
provide multiple binding sites that accommodate diverse amino 
acid side chains in distinct positions. Therefore, SI -site fiexibil- 
ity does not appear to be utilized as a structural determinant in 
the broad specificity of crab collagenase. 



a-Lylic prolease: Exploring the role of structural 
plasticity in substrate specificity 

a-Lytic protease, an extracellular enzyme produced by the soil 
bacterium L, enzyntogenes, has been the subject of intensive 
analysis aimed at relating structure to catalytic activity. This mi- 
crobial protease, while possessing the chymotrypsin-like fold 
comprising two /3-barrels (Brayer et al., 1979), nevertheless dis- 
plays large insertions and deletions relative to the pancreatic en- 
zymes, resulting in an overall RMS deviation in the positions of 
structurally equivalent or-carbons of 1 .36 A for 1 10 of 198 amino 
acids, when compared with chymotrypsin (Fujinaga et al., 1985), 
By comparison, the equivalent pairwise fits with the bacterial 
proteases SGPA and SGPB yield RMS deviations of roughly 
0.7 A, a value very similar to that which relates the mamma- 
lian pancreatic enzymes to each other. The SI pockets of a-lytic 
protease and trypsin are particularly divergent in structure (Fig. 7). 
An insertion of two amino acids causes Met 192 of a-lytic pro- 
tease to occupy a position similar to Ser 190 of trypsin. More 
strikingly, an adjacent surface loop at positions 185-188 is de- 
leted in a-lytic protease, and a second nearby loop at positions 
217-225 is enlarged by eight amino acids. A consequence of 
these differences is that, although both enzymes possess a di- 
sulfide bond linking the conserved residues Cys 191 and Cys 220, 
the positions of the sulfur atoms are displaced by 7-8 A (Fig. 7). 
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1, Diversity in SI -site structure between the mammalian antl the 
microbial trypsin -like eiiv.ymcs is ilhisi rated by a Miperposiiion of tryp- 
sin (green) and a-!ytic protease (red). Although the maniinallan en/.ymes 
such as trypsin possess two weli-deHncd loops (loop 1 and loop 2) join- 
ing the /3-strands of the specificity pocket, in a-lytic protease and other 
microbial enzymes loop 1 is absent, whereas loop 2 is greatly enlarged. 
Conserved disulfide bonds of each enzyme (Cys I9l-Cys 220; yellow) 
are displaced some 7 A Irom each oihcr. The catalytic triad is shown 
at the top in green. 



Kinetic data show thai a-lytic protease possesses a hydropho- 
bic specificity profile for substrate residues in the PI position. 
The preference of the enzyme at PI, as described by relative 
AV(///A';„ values, is roughly Ala > Met, Val, Gly > NIe > Leu > 
Phc for hydrolysis of ictrapeptide amide substi aies (Bauer ei al., 
1981; Bone ei al., 1991). The structural elements that interact 
with the PI -substrate side chains comprise the three hydropho- 
bic side chains Met 192, Met 213, and Val 2 1 7a, which together 
form a shallow depression in the enzyme surface (Brayer ei al., 
1979; Fiijinaga et al., 1985; Fig. 8). More recently, six crystal 
structures of the enzyme complexed with pepttdyl boronic acid 
inhibitors of the general structure R-boroX (where R is met h- 
oxysuccinyl-Ala-Ala-Pro and boroX is the a-aminoboronic acid 
analog of Ala, Val, lle» NIe, Leu, or Phe) have been determined 
at resolutions between 2.0 and 2.5 A (Bone et al., 1987, 1989a, 
1991). Boronic acids are tight-binding (/C;'s in the nanomolar 
range) reversible inhibitors of serine proteases (Kettner & Shenvi, 
1984) that form covalcnt, nearly teirahedral adducts with Scr 195 
(Bone et al., 1987). They represent good structural analogs of 
the high-energy tetrahedral intermediate present on the actual 
catalytic pathway. 

The crystal structures of the boronic acid complexes confirm 
that covalcnt tetrahedral adducts are formed with O7 of Ser 195 
for the Pl-Ala, Val, He, Leu. and NIc inhibitors. The large Pl- 
Phe side chain cannot fit into the Si -site, leading to the forma- 
tion of an unusual trigonal adduct that includes His 57 (Bone 




Kig. 8. Structure of the Si site of a-l\iic protease bound 10 the substrate 
analog .ywc-A la- Ala-Pro- Ala-boronic acid (red), showing the positions 
of the hydrophobic amino acids Met 192, Met 213, and Val 217a, which 
form a platform for binding of small hydrophobic side chains. The three 
/i-strands of the SI site are shown in yellow and the large connecting 
loop is in green. Catalytic groups are also in green (top). Mutation of 
either Met 192 or Met 213 to Ma creates variant enzymes possessing 
greatly broadened specificities toward hydrophobic amino acids, with- 
out sacrificing catalytic efficiency. 



et al., 1989a). The interactions of the inhibitor among these 
structures are nearly identical with the exception of tlie way in 
which the PI side-chains interact with Met 192, .Met 213, and 
Val 217a. These side chains adjust conformation in response to 
the differing sizes and shapes of the inhibitor amino acids. Small 
shifts in the position of adjacent main-chain atoms in the SI and 
S2 specificity sites occur in the complexes with the larger Nle and 
Phe.' Particular importance has been a~sl;ri bed to the rearrange- 
ments at positions 21 7a-2 1 7d (Bone et al,. 1989a, 1991; see be- 
low). Low activity toward the larger Leu and Phe side chains 
appears to arise solely from steric considerations, whereas Met 
is preferred to Leu presumably owing to its greater flexibility. 
Although the structural basis for the preference of Ala relative 
to Val was not unambiguously clear, it was proposed thai strong 
binding to the oxyanion hole, required in the transition state, 
is prevented for the Val substrate on steric grounds. Differences 
in the electronic character of the boronate inhibitor, relative to 
a true transition state, do not allow for a complete mimicking 
of the latter (Bone et al., 1989a). 

The substrate specificity profile of «-Iytic protease was altered 
dramatically by the introduction of either of two single-site tnu- 
tations in the SI site: MI92A or M2I3A (Bone et aL, 1989b; Ta- 
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ble 2; Figs. 8, 9). In each case, high activity toward Ala was 
retained, but the increased size of the SI pK>cket allowed accom- 
modation of Pl-side chains as large as Phe, with catalytic effi- 
ciencies kcai/K,„ increased up to 15- fold relative to wild-type 
cleavage at Pl-Ala. For M192A, improved catalytic efficiencies 
toward PI -Met and Pl-Va! resulted mainly from lowered /^^ 
values, whereas the Pl-Leu and Pl-Phe substrates were im- 
proved in both kcai and /T^. The catalytic activity toward Pl- 
Leu and Pl-Phe substrates was improved by 10^-10*-fold, 
respectively, relative to wild type. However, the wild-type pref- 
erence of nearly lO^-fold for Pl-Ala/Phe was decreased to 
30-foId in M192A and nearly completely eliminated in M213A 
(Table 2). Complicating a straightforward interpretation of the 
profiles of these variants were two factors: (1) the dependence 
of AVff,, /£",„, and A'co/Z^m was not correlated with the size or hy- 
drophobicity of the PI side-chain; (2) enlargement of the pocket 
by the same volume in the two mutants gave rise to considera- 
bly different functional effects. Therefore, extensive structural 
analysis of the mutant enzymes complexed with the boronic acid 
inhibitors was carried out to understand which factors cause the 
altered specificities (Bone et al., 1989b, 1991). 

The principle rationale for the exceptionally broad specific- 
ity profiles of M192A and M213A is that the SI site possesses 
structural plasticity, which encompasses a combination of alter- 
nate side-chain conformations as well as deformability of the 
main chain (Bone et al., 1989b; Fig. 9). For example, accom- 
modation of the Pl-Phe side chain by M192A results from a 
substrate-induced conformational change, in which the side 
chain of Val 217a rotates to remove one carbon from the pocket, 
and the main chain from Val 2 1 7a to Val 2 1 7d shifts by 0.5-0.8 A . 
This permits the large infiexible aromatic ring to be nearly com- 
pletely buried in the specificity pocket. In this case, some of the 
binding energy is presumably used to drive the conformational 
change in the protein, a phenomenon that is also observed to 
lesser extents in other mutant-inhibitor complexes. In general, 
hydrogen bond lengths, buried hydrophobic surface area, un- 
filled cavity volume, and the magnitude of conformational 
changes vary significantly among the various mutant and wild- 
type complexes (Bone et al., 1991). The energetic consequences 
of these differences were quantified (see Bone & Agard (1991) 
for a review of the energetics of intermoiecular interactions) and 
correlated with free energies of catalysis for the various mutant- 
substrate combinations. 

The analysis has led to an increased understanding of the way 
in which the different energetic terms can contribute to the sta- 
bilization of the enzyme-substrate complex, although no single 
factor has-been found that consistently correlates well with ei- 



Table 2. Broadening the specificity of a-lytic protease^ 



X Wild type M192A M2I3A 



Ala 21.000 10,000 600 

Val 790 3,000 340 

Mel 1,800 35.000 980 

Leu 4.1 11,000 160 

Phc 0.38 31.000 340 



Substrate: 5«c-Ala-Ala-Pro-X-/)NA. AVa//A'„. s"' M~'. 
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Fig. 9. Principal rationale for the ability of a-lylic protease mutants to 
exhibit greatly enhanced specificities toward new substrate side chains 
"is structuralplasiiciiy of the SI site. Shown is a superposition of five 
structures of the M192A variant of the enzyme (the new Ala 192 side 
chain is at the right side). Each enzyme is complexed with a peptidyl t>o- 
ronaie inhibitor (not shown for clarity) possessing a particular hydro- 
phobic Pl-side chain (see Fig. 8 for inhibitor binding). The conformation 
of the active site adjusts to the different substrates at position Gly 216 
and in the following loop region (bottom). Both side-chain and main- 
chain rearrangements are important components of active-site plastic- 
ity. The ability of the active site to adjust in this manner may be an 
important factor in the ability to cffea specificity modification by mu- 
tation at only a single site. 



ther activity or inhibition (Bone et al., 1991). Thus, the wild-type 
enzyme has a relatively limited ability to adapt to large side 
chains, so that the specificity profile is driven primarily by steric 
exclusion. M192A, however, is improved in its ability to hydro- 
lyze large side chains in part because the degree of conforma- 
tional change required for their accommodation is reduced; 
further, it also possesses the ability to shrink so that Pl-Ala sub- 
strates are hydrolyzed well. By contrast, the M213A pocket can- 
not contract, leading to a sharply reduced activity toward Pl-Ala 
as well as a reduced discrimination relative to PI -Gly (Boneet al., 
1991). In both mutants, however, the broad specificities depend 
on the ability of the main chain and side chain atoms at posi- 
tions 2I7a-2l7d to readjust (Fig, 9). This fiexibility is proposed 
to arise from a large adjacent surface loop, which begins at res- 
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idue 217a (Figs. 7, 8), and which appears lo be able lo absorb 
structural changes in the preceding residues. The energies of in- 
teraction of the SI site with this and other peripheral structural 
elements thus also play a significant role in determining the spec- 
ificity profiles. 

Another recent study of a-lytic protease used random muta- 
genesis of four residues in the substrate binding pocket, coupled 
to an activity screen using synthetic substrates, to identify new 
variants with altered specificities (Graham et al., 1993). A library 
was constructed beginning with iheM192A variant, with random- 
ization of positions Gly 192a, Arg 192b, Met 213, and Val 217a. 
Screening and qualitative characterization of 47 active variants 
revealed that a majority of the enzymes retained a specificity 
profile similar to that of the parent M192A. Also emerging from 
the screen was a subclass of enzymes capable of cleaving Pl-His- 
containing substrates. All mutants possessing this ability con- 
tained His 213, an amino acid heretofore correlated with Pl-GIu 
specificity in other microbial enzymes (Nienaber et al,, 1993). 
In general, residue 213 appears to play a significant role as a 
primary specificity determinant in several microbial enzymes. 
Although this amino acid has not yet been mutated in any mam- 
malian protease, it appears very unlikely that it will assume a 
similar role. Clearly the divergence in structure of the SI site in 
the two subclasses (Fig. 7) has led to a more prominent role for 
this residue in the bacteria! enzymes, despite the fact that its po- 
sition relative to the Ser 195/His 57 catalytic couple does not 
vary. 

Kinetic data indicate that «-lytic protease makes substrate 
binding interactions over at least six subsitcs from P2' to P4 
(Bauer et al., 1981). Interestingly, the crystal structure shows 
that a small hydrophobic pocket exists beyond the P4 side chain 
of the tetrapeptide boronic acid inhibitor, formed from residues 
Leu 227, Leu 180, Val 167. .Ma 169, and Ser 225 (Bone et al., 
1987). Although extension of a substrate side chain to fill the 
S5 site does not have a significant inHuence on kinetic param- 
eters (Bauer et al., 1981), it is possible that additional binding 
energy from interactions in the hydrophobic pocket cannot be 
realized in catalysis unless a P6 side-chain is also bound. Little 
specificity has been observed at ihe other subsiies, although a 
preference for Pro at position P2 has been noted in binding of 
the peptide boronic acid inhibitors (Boneet al., 1987), Although 
the S2 enzyme sire is hydrophobic, adjacent side-chain hydroxyl 
groups of Ser 214 and Tyr 17 I participate in a hydrogen bond- 
ing network, which includes the carboxylate of Asp 102. Intro- 
duction of the mutations S2I4A and VI 71F caused decrea.ses in 
both kf-ai i*nd K„,, and the data were used to infer that the role 
of the two hydroxyl groups in the native enzyme is to facilitate 
catalysis by maintaining the S2 site in an optimal configuration 
(Epstein & Abeles, 1992). 

Mutational analysis of trypsin: Combining 
structural genetics, classical cnzymology, 
and X-ray crystallography 

Trypsin represents the third serine protease that has been the 
subject of extensive mutational analysis aimed ai an understand- 
ing of substrate specificity. These studies have focused largely 
on the origins of specificity at the primary SI site. At this posi- 
tion, trypsin hydrolyzes amide substrates containing Pl-Lys and 
PI -Arg amino acids by factors of IQ - or greater relative to the 
next-preferred residues (Graf et al,, 1988; Evnin et al., 1990). 



The preference of the enzyme is 2- 10- fold in favor of Arg- rel- 
ative to Lys-containing substrates (Craik et al., 1985; Perona 
et al., 1993c). As might be expected from their structural dis- 
parity, Lys and Arg interact in a differential manner with the 
primary determinants Asp 189 and Ser 190 (Ruhlmann et al., 
1973; Bode et al., 1984; Fig. 10). The guanidinium group of Pl- 
Arg substrates makes an ion-pair interaction with Asp 189, 
whereas the interaction of Pl-Lys is solely by a water- mediated 
contact. Both Arg and Lys substrate side chains also interact 
with Ser 190, 

An early study assessed the precision with which the SI site 
is constructed by introducing small perturbations: the Gly resi- 
dues at positions 216 and 226 were converted to Ala, resulting 
in the three trypsin mutants G2I6A, G226.A and G216A/G226A 
(Craik et al., 1985; Fig, 10). Relative specificities for tripeptide 
amide Pl-Arg/Lys substrates, as assessed by the ratio of 
kf,y,/Kf„ values, were altered by up to 20- fold. Catalytic effi- 
ciencies were decreased by 40-fold to 10**-fold, and these effects 
involved significant decreases in Av^,, as well as higher K,„ values. 
The differential effects of the AVu/ ^^^^ ^/n values resulted in en- 
zymes that were more Arg specific (G2I6A) and more Lys spe- 
cific (G226A) than the wild-type enzyme. Subsequent crystal 
structure determinations of trypsins G226A (VVilke el al., 1991) 
and G216A (M.E. McGrath & R.J. Fleiterick, unpubl. results) 




Kig. 10. Role of the position of ihc negative charge at the base of the 
trypsin SI site has been probed by random and site-directed mutagenesis 
coupled lo crystal structure analysis of variants. Shown is the structure 
of the SI binding pocket of trypsin, indicating the positions at which ihc 
negatively charged amino acid has been determined by X-ray cry.stal 
structures. Blue, wild -type (rypsin at position 189; red, trypsin D189G/ 
G226D at position 226; yellow, e.\ogenously added acetate ion in tryp- 
sin D189S (aceiaic reconstitutes activiiy toward PI -Arg and PI -Lys- 
containing substrates). Wild-type amino acids at positions 216 and 226 
arc each Gly, permitting access of the large Pl-Lys (green) and PI -Arg 
side chains to Asp 189. 
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complexed with benzamidine showed that the alanine substitu- 
tions produced no structural perturbations beyond the imme- 
diate vicinity of the mutated residues. Because the catalytic triad 
Ser 195, His 57, and Asp 102 amino acids are unaffected by these 
binding pocket alterations, it is highly probable that the de- 
creases in kcat are attributable to altering the catalytic register 
of the scissile bond. These data thus provided an early dem- 
onstration that substrate binding and catalytic turnover are in- 
terrelated functions in trypsin, and that they can be affected 
differentially to alter the function of the enzyme, 

A series of studies have addressed the role of the negatively 
charged Asp 189 residue in binding and catalysis. These inves- 
tigations have made use of both site-directed mutagenesis as well 
as a genetic selection approach for the isolation of new variants 
(Fig. 4B). The selection is based on expression of a library of 
trypsin variants into the |>eriplasmic space of an E. coii strain 
that is auxotrophic for arginine or lysine {Evnin et al.. 1990). 
Cells are plated on minimal media containing a nonnutritive sub- 
strate analog of one of these amino acids; active trypsins cleave 
the analog, liberating free amino acid and thereby relieving the 
auxotrophy (Evnin et al., 1990; Perona et al,, 1993a). 

Twenty variant trypsins have been isolated from a library of 
400 possible mutants encompassing the amino acids at positions 
189 and 190 at the base of the SI site. Kinetic characterization of 
these enzymes, as well as of the variants D189K(Graf eial., 1987) 
and D189S (Graf et al.. 1988), indicates that the presence of a 
negative charge at the base of the binding pocket is essential to 
high-level catalysis by trypsin. Variants lacking the negative charge 
are compromised in k^a,/Km toward peptidyl Arg- or Lys- 
containing amide substrates by a factor of 10^ or greater. Ac- 
tivity toward these substrates is partially restored by the presence 
of an Asp or Glu residue at positions 189 or 190. The variants 
span a range of catalytic efficiencies ranging from wild type to 
decreases of lO'^-foId (Evnin et al.. 1990; Perona et al., 1993a). 

A framework for the interpretation of these data is provided 
by kinetic and crystallographic investigation of two other vari- 
ants: trypsins D189G/G226D (Perona et al.. 1993b, 1993c) and 
D189S (Perona et al., 1994). The structure of each mutant en- 
zyme was determined complexed with the protein inhibitors 
APPl and/or BPTl, which are analogs of the substrate 
Michaelis complexes possessing Arg and Lys, respectively, at the 
PI position (Perona et al., 1993b). This allows for the direct 
comparison of substrate-like interactions of Arg and Lys side 
chains in the binding pockets of wild-type and mutant enzymes. 
Trypsin D189G/G226D is equally reduced (10-fold) in binding 
affinity toward Lys and Arg substrates and is sharply lowered 
(10^-fold) in k^.f^^ toward Arg. The crystallographic analysis 
showed that Asp 226 is partially sequestered from substrate by 
intramolecular interactions made with Ser 190 and Tyr 228, such 
that only a single carboxylate oxygen is available for substrate 
binding. Further, comparisons with the wild-type interactions 
indicated no correlation between the binding affinities of either 
Lys and Arg substrates and the number of direct contacts made 
with Asp 226. Therefore, it appears that substrate binding af- 
finity to trypsin depends upon the accessibility of the negative 
charge to substrate and not upon the formation of direct inter- 
actions. This observation implies that direct electrostatic hydro- 
gen bonding interactions between the substrate Lys/Arg and the 
enzyme carboxylate group do not significantly improve the free 
energy of binding relative to indirect water-mediated interactions 
(Perona et al,, 1993c), 



The crystal structure of trypsin DI89S revealed that an ace- 
tate ion from the crystallization buffer was trapped at the base 
of the binding pocket, such that its carboxylate group was par- 
tially oriented toward substrate (Perona et al., 1994; Fig. 10). 
Exogenously added acetate provided up to 300-fold rate en- 
hancements to trypsin D189S toward Arg- and Lys-containing 
substrates, but catalytic activity remained diminished relative to 
wild-type trypsin. This structure thus provides a second exam- 
ple showing that optimal placement of the negative charge in the 
binding pocket is critical to catalysis. Significantly, the diminished 
activities of both trypsins D189G/G226D and D189S/acetate are 
reflected in kcat as well as /T^. Measurement of activities toward 
analogous ester as well as amide substrates by these enzymes al- 
lows calculation of the mechanistic parameters /C,, k^, and k^ 
(Zerner & Bender, 1964; Fig. 2C), removing the ambiguity in in- 
terpretation of the steady-state Michaelis-Menten parameters. 
This analysis shows that the role of the Asp 1 89 carboxylate in 
trypsin is twofold: it provides both tight binding affinity as 
well as high acylation rate k^ (Perona et al., 1994). Therefore, 
the precise location of the negatively charged group within the 
trypsin SI site is critical to positioning the scissile bond in cata- 
lytic register with Ser 195 and His 57. 

Analysis of the kinetic properties of the 20 variants isolated 
from the genetic selection corroborates these hypotheses regard- 
ing the operation of the SI site. Although the binding constants 
of the enzymes vary widely, it is significant that relative affini- 
ties for Lys versus Arg substrates remain very similar (Perona 
et al., 1993a). The negatively charged carboxylate in these mu- 
tants is provided by either Asp or Glu at positions 189 or 190, 
and the partner to this residue is 1 of 10 different amino acids. 
Thus, it is very unlikely that equal reductions in affinity toward 
Lys versus Arg substrates can in most cases be attributed to 
an equal loss of hydrogen bonding or electrostatic interactions. 
Instead, binding affinity is likely to be better correlated with ac- 
cessibility of the negative charge to substrate; barring substrate- 
induced conformational changes, this accessibility will be the 
same for both Lys and Arg substrates. Binding affinities are then 
predicted to be weaker when the carboxylate is partially seques- 
tered from substrates, as seen in the structures of the mutants 
D189G/G226D and D189S/acetale. Crystal structures of addi- 
tional variants from the selection pool should enable a quanti- 
tative correlation between binding affinity and accessibility of 
the negative charge. These experiments also explain the ratio- 
nale for conservation of the Asp at position 189 in the vast ma- 
jority of trypsin homologs, because other locations result in 
partial sequestration of the negative charge. 

In a second set of experiments, site-directed mutagenesis has 
been used to convert trypsin into a chymotrypsin-like protease 
possessing high selectivity for cleavage adjacent to large hydro- 
phobic amino acids (Hedstrom et al., 1992, 1994a, 1994b). The 
structures of the SI pockets of the two enzymes are very similar 
(Figs. 6, 1 1 A), so it was expected that specificity modification 
might be straightforward as in subtilisin and a-Iytic protease. 
However, when the amino acids directly in contact with sub- 
strate were exchanged into trypsin, the resulting variants DI89S 
and D189S/Q192M/n38T/T218 failed to exhibit significant im- 
provement in cleavage of Pl-Phe amide substrates (Graf et al., 
1988; Hedstrom et al., 1992; Table 3). Poor efficiency was also 
shown toward trypsin substrates, as expected because the pocket 
lacks a negative charge. The crystal structure of trypsin D189S 
showed that only very local structural changes were introduced 
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Mg. 1 i. A: Comparison of ihe Si sites of 
irypsin and chyniotrypsin. Van der Waals 
surfaces of each enzyme are shown with 
the posi(ton-lS9 amino acid (Asp in tr\'p- 
sin; Scr in chymotr>'psin) indicated in red. 
In yellow is the conserved Ser 190. which 
is orienied into the SI pocket in trypsin 
but rotates out in chymoirypsin. The in- 
serted Thr 218 in chymotrypsin is shown 
in green. Two other amino acids directly 
in or adjacent to the SI site are lie 138 
(Thr 13S in chymoirypsin). and Gin 192 
(Met 192 in chymotrypsin). Although a 
high degree of structural similarity is clear, 
exchange of these four amino acids fails 
lo transfer chymoiryptic specificity to 
irypsin. B: Structural determinants re- 
quired to c-xchangc substrate specificity 
include two adjacent surface loops (loop 
I and loop 2) and an amino acid (Tyr 172 
in trypsin) in a third adjacent segment 
(loop 3). None of these .structural ele- 
ments directly contact substrate (shown 
at top in ill in green lines). Trypsin is 
shown in red and chymoirypsin in green. 




as a consequence of the substitution; the binding pocket main- 
tains a trypsin-Iike conformation (Perona ei a!., 1994). This 
conHrms thai the small struct ura! differences between trypsin , 
and chyrnotrypsin in the SI site (Fig. 1 1 A) must be critical deter- 
minants of the specificity and must rely on more distant parts of 
the structure for maintenance of their particular conformations. 

Exchange of the two surface loops. loop 1 and loop 2 (Fig. I IB), 
resulted in the hybrid enzyme Tr—ChlSl +L1 + L2j, which ex- 
hibited an acylaiion rate constant kz equal to that of chymo- 
trypsin toward peptidyl TM-Phe amide substrates (Hcdstrom 
et al., 1992; Table 3). However, the enzyme was still reduced by 
nearly lO '-fold in k,.,„/K,„ because of a very weak substrate 
binding affinity. The mechanistic kinetic parameters A\, As, 
and /l'i were calculated for cleavage of both single-residue and 
peptidyl Pl-Phe amide substrates for the enzymes trypsin, chy- 
moirypsin. DI89S and Tr-»Ch|SH-LH- L21. These data showed 
that, like chymotrypsin, the hybrid trypsin was able to use the 



binding energy obtained by occupancy of the S2-S4 enzyme sites 
to increase the acylation rate. They also demonstrated that, 
among this series of enzymes, the keyniechanistic step that de- 
termines substrate specificity is not binding affinity, but instead 
the chemical step of acylation (Hedstrom et al., 1992, 1994a). 

Further mutations were sought to improve catalytic efficiency 
toward chymotryptic substrates by increasing binding affinity. 
The additional mutation Y172W in a third adjacent surface loop 
(Fig. I IB) produced the hybrid enzyme Tr~^Ch[SH-Ll -f L2-t- 
YI72W1, which improves the activity of Tr-*Chl,SI -i-Ll -M..21 by 
20-50- fold, creating an enzyme with up to 15% of the activity 
of chymotrypsin (Hedstrom et al., 1994b; Table 3). The im- 
provement toward a tetrapcptide Pl-Phc amide substrate is 
manifested almost entirely in tighter binding affinity. The rel- 
ative catalytic efficiencies measured toward Trp, Tyr, Phe, and 
Leu PI -amide substrates also more closely mimic chymotryp- 
sin (Hedstrom et al., 1994b). 
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Table 3. Conversion of trypsin to chymotryptic specificity^ 





(M) 




A-3 is-') 


Trypsin 


>0.25 


>0.2 


36 


DI89S 


0.015 


0.29 


33 


Tr-*ChlSI+LH-L2] 


o.on 


20 


37 


Tr-.Ch(S 1 + L 1 + L2+ Y 1 72\V] 


5.0 X iO-^ 


41 


63 


Chymoirypsin 


1.5 X 10-' 


850 


52 



Substrate: 5»c-,'Ma-AIa-Pro-Phe-/7NA, 



The siructural basis for the acitviiics of the two hybrid tryp- 
sins was elucidated by determination of their crystal structures 
coinple.xcd with the iransition-siate inaciivator ^wr-Ala-.Ala-Pro- 
Phe-chloromethyl ketone {.s;/t*-AAPF-CMK; Perona et al.. 
1995). Loop 2 orTr-*ChISl + Ll + L2I adopts a conformation 
identical to thai which it possesses in chymotrypsin. However, 
amino acids at posiiions 185-187 within Loop 1 are disordered. 
The structure orTr-*Ch(S 1 + L I + L2-h Y 1 72\V| showed improved 
order in Loop 1 and a rearrangement of solvent structure and 
Ser 217 side-chain orientation, each of which more closely mim- 
icked the structure of chymotrypsin. No other changes were 
present between the two hybrid enzymes, implicating these struc- 
tural elements as important determinants of A", in chymoirypsin. 

Both hybrid enzymes possess wild-type chymotrypsin-like ac- 
ylation rates A'; toward peptidyl Pl-Phe amide substrates, and 
each utilizes binding of the extended peptide (substrate sites P2- 
P4) to increase this rate. In fact, the 10''-fo!d specificity of chy- 
moirypsin relative to trypsin for cleavage at Pl-Phe is manifested 
solely in e.viended peptidyl substrates; only a 10~-fold level of 
discrimination exists for single-residue substrates (Hedstrom 
ei a I., 1994b). in all available crystal structures of the enzymes, 
including those of the trypsin hybrids, two hydrogen bonds are 
formed in an antiparallel ^-sheet fashion with the backbone am- 
ide group of Gly 216 (Perona ci al., 1995). The backbone con- 
formation at Gly 216 differs between trypsin and chymotrypsin; 
the hybrid enzymes adopt a chymotrypsin- like conCormation. 
This suggests that the Gly 216 backbone is a critical specificity 
determinant because it directly binds a portion of substrate re- 
sponsible for a I0**-fold preference at position PI. The mechanism 
by which Gly 216 functions is likely to be through promoting ac- 
curate scissile bond positioning (Perona ei al., 1995). Because 
Asp 189 of trypsin also plays a critical role in this function, ii ap- 
pears that the identity of the amino acid at position 189, and the 
backbone conformation at Gly 216, must be matched in order to 
permit efUcient and specific catalysis by trypsin and chymotrypsin. 

Structural comparisons among a number of the chymotrypsin- 
like proteases, including both PPE and KNE, showed a strik- 
ing correlation between the PI -site specificity and ilie backbone 
conformation at position 216 (Perona et al.. 1995). Three struc- 
tural clas.ses were delineated, which correspond to trypsin, chymo- 
trypsin, and elastase-like enzymes (Fig. 12). The role of Gly 216 
in promoting accurate substrate positioning may thus be a fea- 
ture of many enzymes in the family. In this context it is relevant 
to note thai the kinetic phenomenon observed for both trypsin 
(Perona et al., 1993c) and chymoirypsin (Hedstrom et al., 
1992), namely that subsiic occupancy causes large increases in 
the rates of the chemical steps of catalysis, is also common to 
other I rypsin-like enzymes including PPE (Thompson & Bloui, 



Fig. 12. A correlation is obser\'cd between the backbone conformaiion 
of residue 216 and the Si site substrate preference among all of the 
trypsin-, chymoir\'psin-, and elastase-like proteases of known structure. 
Shown is a superposition of seven mammalian serine proteases (color- 
coded), indicating the structure at this position that is most easily visa-' 
alizcd in the orientation of the carbonyl oxygen atom. Specific 
trypsin-like, chymotrypsin-likc. and elastase-like backbone angles 
are observed. Residue 216 binds the P3 position of the substrate in all 
the enzymes. Extended peptide binding to residue 216 is required both 
lo achetvc full catalytic poiency as well as to obtain a maximal level of 
PI -site discrimination among alicrnaiive amino acids. Conversion of the 
substrate spccificiiy of trypsin to thai of chymotrypsin requires reori- 
entation of Gly 216 lo a chymoiryp.sin-Iike conformation. Thus, the 
posilion-216 backbone is .strongly .suggested as an essential specificity 
deierminani in the mammalian trypstn-likc proteases. 



1970), HNE (Stein et al., 1987), SGPA (Bauer et al., 1976; 
Bauer, 1978), SGPB (Bauer, 1978), and a-lytic protease (Bauer 
et al., 1981; also see above). The significance of the recent ki- 
netic analysis (Hedstrom et a!., 1992) is that it shows that both 
the catalytic rate toward cognate substrates, as well as the de- 
gree of specificity at the Pl-position, are dependent on the fill- 
ing of subsites, which themselves exhibit little amino acid 
preference. 

The crystal .structures of the trypsin hybrids also addicss an- 
other fundamental question in enzyme catalysis: the role of the 
global protein structure. Distal structural elements such as Trp 172 
and loops I and 2 play a key role in specifying the conforma- 
tion of residues that do interact directly with substrate. Thus, 
their role is not solely to provide an inert platform that stabi- 
lizes the amino acids that interact directly with substrate. These 
elements of the global architecture play an active role in deter- 
mining substrate specificity as well, which should thus be viewed 
as a more distributed property of the protein fold. An alterna- 
tive mechanism for (he way in which global protein folds may 
influence specificity is by modulating the degree of backbone 
flexibility of the SI site, as exemplified in the a-lyiic protease 
studies (Bone et al.. 1991). 

Exchange of the SI -site residues of HNE into trypsin also fails 
to convert the specificity of trypsin and results, as in the case 
of the mutants D1S9S and Dl 89S/QI92M/1 138T/T218, in a 
poor nonspecific protease (.LJ. Perona & C.S. Craik, unpubl. 
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obs.). Similarly, introduction of Lys, Arg, or His residues into 
the trypsin SI site has failed lo generate specificity toward Asp 
or Clu residues (Grafet al., 1987; Willett et a!., 1995: J.J. Pe- 
rona & C.S. Craik, unpubl. obs.). .A better mutational strategy 
for specificity modification in trypsin may be the construction 
of libraries that instead span the distal structural elements. When 
coupled to strategies such as the genetic selection (lEvnin et al.» 
1990; Perona eial., 1993a) or phage display (Corey et al., 1993; 
Fig. 4C) systems, it should be possible to search a large num- 
ber of different structures for those providing altered specificity. 

Surface loops determine subsite specificity 
in the trypsin -class enzymes 

We have seen that the best-studied members of the chymotrypsin- 
like class of serine proteases each manifest primary specificity 
at the PI site directly adjacent to the cleaved bond. However, 
there arc also several enzymes in the class that possess signifi- 
cant specificity toward substrate residues at a greater distance 
in both the N- and C-terminal directions. Sequence alignments 
of these enzymes reveal that a number of surface loops flank- 
ing the catalytic residues are very likely to play crucial roles in 
determining this extended recognition selectivity (Fig. 13). 

One enzyme manifesting an extended subsite specificity that 
is also of known tertiary structure is RMCPIl (Woodbury et al., 
1978a, 1978b), a member of a homologous subclass of trypsin- 
like serine proteases expressed also in other granulocytes (Sal- 
vcscn et al., 19S7) as well as in lymphocytes (Lobe et al,, 1986). 
RMCPIl and the related RMCPI (which possess 73*^i^o amino 
acid sequence identity; LeTrong et al., 1987b) each manifest a 




Fig. 13. Structure of trypsin, highlighting the positions of four surface 
loops tloops A, B. C, D) involved in determining subsite preferences 
among a number of the enzymes in the family. The location of these 
loops relative to the catalytic machinery and binding cleft may be con- 
trasted with the position of the three loops (loops 1,2,3) thai combine 
to influence specificity in the SI site. A polypeptide substrate chain is 
shown in green and the catalytic triad is in yellow. Ii is clear that loop 
C is positioned to interact wiih substrate residues N-tenninal to the scis- 
sile bond, whereas loops A and D are positioned to interact with the 
C-tcrminal amino acids on the leaving -group side of the ivcissilc bond. 



chymoirypsin-like primary substrate specificity but also exhibit 
preferences for hydrophobic amino acids in positions P2 and P3 
(Yo.shidaei al., 1980; Powers et al.. 1985). RMCPI also has been 
shown to prefer hydrophobic residues at position PI' in poly- 
peptide subst rates, although t he extent of the selectivity has not 
been established quantitatively (LeTrong et al., l9S7a). 

The crystal structure of uncomplexcd RMCPIl has been de- 
termined at a resolution of 1 .9 A (Remington et al., 1988). This 
structure suggests that the enhanced substrate selectivity of the 
homologous RMCPI at the PT position is likely to be provided 
by the presence of a large cleft not found in the other chymo- 
trypsin-like proteases of known structure. The cleft is formed 
as a consequence of an unusual conformation adopted by two 
surface loops that lie adjacent to the catalytic residues (Reming- 
ton et al., 1988). The loops comprise residues 34-41 (loop A) 
and 59-64 (loop B) and are positioned such as to be capable of 
interacting directly with substrate residues C-ierminal to the 
scissile bond (Fig. 13). Modeling of a substrate complex with 
R.N'lCPll suggests that loop A is most likely to directly contact 
the Pr-P2' substrate sites, whereas loop B plays a structural role 
in helping lo form the cleft. 

The subclass of .serine proteases to which RMCPIl belongs 
is distinguished by the ab.sence of the otherwise well-conserved 
disulfide bond linking residues 191 and 220 (LeTrong et al., 
1987b). In the other enzymes, this disulfide bridges the two walls 
of the SI site and likely provides a degree of structural rigidity 
to the cavity (Fig. 7). RMCPIl possesses a Phe residue at posi- 
tion 191 and a shortened loop L2 (residues 217-225) relative to 
chymotrypsin; each of these features is conserved within the sub- 
class (LeTrong et ai., 19S7b). Modeling of a tripeptide substrate 
possessing Phe at position P3 shows that the aromatic ring is 
readily sandwiched beiweeti the side chains of Met 192 and 
Pro 221 A and also makes van der Waals interactions with Phe 191 
(Rctnington ct al., 19S8). This small hydrophobic pocket is ab- 
sent in chymotrypsin owing to the presence of the Cys 191- 
Cys 220 disulfide bond. Thus, the crystal structure provides a 
plausible rationale explaining the 100-fold preference of RMCPI 
and RMCPIl for Phe relative to Gly at position P3 (Yoshida 
et al., 1980). 

A second example of extended binding site specificity is pro- 
vided by the enzyme cnrcropcptidasc (en rcro kinase), which func- 
tions in vivo to cleave the zymogen trypsinogen at position He 16, 
generating the new N-terminus required for trypsin activity (re- 
viewed in Huber Sl Bode, 1978). This enzyme hydrolyzes the 
peptide bond directly C-terminal to the sequence (Asp)4Lys in 
trypsinogen, and consequently possesses a irypsiu-like specific- 
ity toward positively cliargcd amino acids in the P I position. The 
bovine and porcine enzymes exist as glycosylated disulfide-linked 
hetcrodimers comprising a heavy chain of 115 kDa and a light 
chain of 43 kDa (Magee et al., 1977; LaVallie et al., 1993). 
Chemical modification studies established that the catalytic ac- 
tivity and specificity of the enzyme resides in the light chain 
(Light & Fonseca, 1984). Most recently, cloning and expression 
of the light chain has revealed it to po.ssess 35-40*^7o sequence 
identity to the trypsin-like class of serine proteases (LaVallie 
et al., 1993). This study also demonstrated that this subunit pos- 
sesses full activity toward the fluorogenic peptide substrate 
(AspXiLys-^-naphthylamide. The presence of the heavy chain, 
however, endows the holoenzyme with 1 00- fold greater catalytic 
efficiency toward the cognate trypsinogen substrate (LaVallie 
et al.. 1993). 
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Native enteropeptidasc is capable of cleaving the (Asp)4Lys 
sequence in irypsinogen with a catalytic efficiency roughly lO'*- 
fold greater than trypsin (Maroux et al., 1971). Mapping the se- 
quence of the light chain of the enzyme onto the structure of 
trypsin indicates that the peptide Lys 96-Arg 97-Arg 98-Lys 99 
(KRRK) is well positioned to play a direct role in interacting with 
the negatively charged aspartates occupying positions P2-P5 
(LaVallie et al.. 1993). This peptide comprises a portion of a sur- 
face loop located adjacent to Asp 102 (loop C; Fig. 13), on the 
opposing side of the catalytic triad relative to the loops A and 
B that form the cleft important to PI' recognition by RMCPI. 

The kinetic basis for the improved specificity of enteropep- 
tidase relative to trypsin for recognition of the (Asp)4Lys se- 
quence is not yet known. By analogy with the known operation 
of the pancreatic proteases, it would be predicted that the spec- 
ificity arises at least partly from the ability of enteropeptidasc 
to selectively accelerate the acylation rate of (Asp)4Lys-/3- 
naphthylamide relative to other peptidyl or to single-residue sub- 
strates. It is tempting to speculate that enteropeptidasc may use 
a distinct structural mechanism, involving specific interactions 
with the aspartates, to convert substrate binding energy into a 
high catalytic rate. Inspection of the sequence alignment with 
trypsin reveals further differences at positions 215-219 at the lip 
of the SI site, as well as the insertion of a residue in loop L3 
(Fig. 13), each of which may be of importance to precise orien- 
tation of the (Asp)4Lys substrate. Additionally, enteropeptidasc 
possesses a striking 10-residue insertion between residues 58 and 
59, in the surface loop B that lies directly behind the KRRK se- 
quence of loop C (LaVallie et al., 1993; Fig. 13). Ahhough loops 
B and C do not contact each other in trypsin, the much larger 
loop B in enteropeptidasc would be capable of making inter- 
actions conceivably of importance to maintaining correct ori- 
entation of the KRRK residues. 

A third example of the importance of surface loops in these 
enzymes relates to the inhibition of the trypsin-like tissue plas- 
minogen activator by plasminogen activator inhibitor I (Ny et al., 
1986). The interaction between TPA and PAl-1 is of importance 
in the regulation of the cascade of activities involved in blood 
clotting (Davie et al.. 1991). Surface loop A of TPA (Fig. 13) 
possesses a high density of positively charged amino acids (res- 
idues Lys 296-His 297-Arg 298-Arg 299) that have been shown 
to be critical to its interaction with a negatively charged region 
of PAl-1 (Madison et al., 1990). This was confirmed in an el- 
egant experiment in which loop A in the homologous enzyme 
thrombin was replaced with that of TPA, endowing PAI-1 sus- 
ceptibility onto thrombin (Horrevoets et al., 1993). Thus, both 
the extended substrate specificity as well as the specificity of in- 
teraction with physiologically important inhibitors can arise 
from contacts with the same surface loops. 

An important activity of crab collagenase is the ability to 
cleave native triple-helical collagen, a property not exhibited by 
the canonical pancreatic proteases (Eisen et al., 1973; Tsu et al., 
1994). Cleavage occurs within domains of the triple-helical sub- 
strate that are relaxed from the strict Gly-Pro-Xaa repetitive se- 
quence. Detailed examination of the cleavage sites by protein 
sequencing has shown that proteolysis of collagen occurs at po- 
sitions that mirror the PI -site selectivity (Tsu et al,, 1994). Se- 
quence alignments of a range of serine coUagenases from diverse 
species fails to elucidate clear amino acid similarities that might 
be correlated to the triple-helical specificity (Sinha et al.. 1987; 
Sellos & Van Wormhoudt, 1992). However, the crystal structure 



of collagenase complexed with the dimeric protein inhibitor eco- 
tin has allowed construction of a model of collagen interacting 
with the enzyme (J.J. Perona, C.A. Tsu. R.J. Fletterick, & C.S. 
Craik, in prep.). Several surface loops, including loops A and D 
(Fig, 1 3), may play crucial roles in recognition of the triple helix. 

Recently, a novel assay has been introduced that provides the 
possibility of assaying relative preferences at p>ositions on the 
leaving-group side of the scissile bond (Schellenberger et al.. 
1993). In an initial study, the ST subsite specificities of trypsin 
and chymotrypsin from cow and rat were determined by moni- 
toring the reverse reaction of peptide hydrolysis. Acyl transfer 
was measured to a mixture of 21 peptide nucleophiles of the gen- 
eral structure H-Xaa-Ala- Ala-Ala- Ala-NHj; the decrease in 
concentration of each nucleophile was monitored by HPLC and 
represents a measure of the ability of that substrate to compete 
with water for attack on the acyl enzyme. Chymotrypsin hydro- 
lyzes substrates possessing Arg and Lys at the substrate PI' po- 
sition roughly 10-fold more rapidly than does trypsin; this 
selectivity is attributable to the presence of additional negatively 
charged residues in two adjacent surface loops (see below). Tryp- 
sin exhibits a slight preference for hydrophobic amino acids at 
this position, relative to chymotrypsin. The data confirm the rel- 
ative lack of si;>ecificity of each enzyme at this position. Appli- 
cation of the methodology to crab collagenase showed a 30-fold 
preference for PI '-Arg residues; an Arg is also found on the 
C-terminal side of several of the collagen cleavage sites of the 
enzyme (Tsu et al., 1994). Data have also been obtained for spec- 
ificities at the subsites Sr-S3' for trypsin, chymotrypsin, a-lytic 
protease, and the cercarial protease from Schistosoma mansoni; 
in these cases, relative cleavage rates varied by factors of up to 
lO^-fold (Schellenberger et al., 1994). 

It is clear from the many known structures of chymotrypsin- 
like serine proteases that loop C is invariably positioned to di- 
rectly contact the extended substrate on the N-terminal side, 
whereas loops A and D interact on the leaving group side. By 
contrast, loop B appears less likely to be involved in direct con- 
tacts but instead is positioned to stabilize the primary inter- 
actions made by the more forward loops (Fig. 13). Depending 
on the size and conformation of this loop in different enzymes, 
it might in principle be able to stabilize either loop A or C. A 
final example of specificity modification in this class involves 
loop D: introduction of histidine residues at the N- and C-terminal 
ends of this loop confers metal-dependent specificity for histi- 
dine at the P2' substrate position onto rat trypsin (Wiilett et al., 
1995). In general, because subsite specificity of chymotrypsin- 
like proteases is modulated by surface loops rather than by core 
secondary structure elements, the prospects for engineering novel 
specificities, and for the development of "restriction proteases" 
that might recognize substrate sites from P5 to P2', seem hopeful. 

Conclusions and future directions 

One of the questions addressed in these studies is the role of 
water molecules in mediating enzyme-ligand interactions. Crys- 
tal structures of wild-type and variant enzymes complexed with 
substrate analogs, together with the measurement of affinity 
constants, allows deduction of the importance of particular 
interactions. In the recognition of basic Lys and Arg substrate 
side chains by Asp 189 of trypsin, the conclusion is that a water- 
mediated interaction can provide a comparable free energy gain 
to a direct contact (Perona et al., 1993c). These studies have im- 
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plications to understanding protein-nucleic acid interactions. 
For example, the crystal structures of the trp repressor-operator 
complex, and of the uncomplexed operator DNA, suggest a 
crucial role for water-mediated interactions in providing DNA 
sequence specificity because no direct contacts with base func- 
tional groups are observed (Otwinowski et al., 1988; Shakked 
et al., 1994). Although a second-site reversion analysis of the 
operator DNA further implied a key role for the intervening wa- 
ters, it was clear that a structural analysis of the modified com- 
plexes is still required (Joachimiak et al., 1994). Such an analysis 
of the charge-charge interactions in the trypsin SI site shows 
more definitively that a specificity-determining role for solvent 
is in principle possible. A similar study of the trp repressor and 
of other systems is warranted, to address the extent to which this 
phenomenon may be dependent on the local structural context. 

Another fundamental question concerns the design of enzyme 
structures to provide different degrees of flexibility to the sub- 
strate binding site. The comparison of trypsin and a-lytic pro- 
tease offers an excellent opportunity to address this issue. Thus 
far, it appears from both kinetic and structural analysis of mu- 
tants that the trypsin pocket may be considerably more rigid. 
However, the two structures are homologous so that the degree 
of difference in the surrounding scaffolds is relatively small. 
Thus, the problem may be manageable: which specific inter- 
actions bridging the primary and secondary shell residues arc 
most critical for determining flexibility? Are residues located 
even more distant also important? An excellent test of our un- 
derstanding would be the construction of a trypsin variant with 
chymotrypiic specificity, which possessed far fewer than the 15 
alterations of Tr^ChlS I + L 1 -h L2+ Y 1 72W] . I f indeed the con- 
formation of Gly 216 is crucial to PI -site specificity, then the 
problem reduces to adding certain key mutations to DI89S such 
that Gly 216 can reorient upon substrate binding, as it is ob- 
served to do in a-lytic protease (Bone et al., 1991; Fig. 9). A 
deeper understanding of flexibility would have clear application 
to protein folding and stability as well (Rose & Creamer, 1994). 

The degree to which a substrate binding cleft is inherently de- 
formable may be an important parameter governing the ease 
with which specificity modification can be effected. Prior to the 
advent of site-directed mutagenesis, it appeared possible that 
even conservative amino acid changes might cause highly dele- 
terious long-range structural effects. We now know that most 
substitutions are absorbed locally and that the majority of pro- 
tein structural contexts therefore have some ability to deform. 
Protein folding and stability often are not greatly perturbed even 
by very challenging mutations. The sensitivity of enzyme activ- 
ity to precise substrate positioning might alternatively suggest 
that mutation of the binding site would usually result in low cat- 
alytic activity. However, this appears not to be the case: among 
the well-studied binding pockets considered here, the subtilisin 
SI and S4 sites, as well as the ot-lytic protease SI site, each are 
readily modified to alter specificity with only limited local sub- 
stitutions. Only the trypsin SI site requires extensive nonlocal 
changes. 

Another reason for the difficulty in modifying trypsin sub- 
strate specificity could be that the charge-charge interactions 
present in a trypsin transition-state complex require a precise 
electrostatic environment not readily altered (Hwang & Warshel, 
1988). The electrostatic potential is presently the least under- 
stood force shaping enzyme structure and activity; it is also the 
only one that operates over large distances. Considerable efforts 



are underway to improve empirical forcefields, so that catalytic 
free energies can be accurately estimated directly from structural 
models. Serine proteases are a favored system in these studies 
owing to the large database of structure-activity information 
(Bash et al., 1987; Rao et al., 1987; Caldwell et al.. 1991; Mizu- 
shima et al., 1991; Wilson et al.. 1991). Further mutational anal- 
ysis will thus also be invaluable in providing a testbed for new 
algorithms. Greater insight into the connection between struc- 
ture and energetics will lead to much better predictive ability re- 
garding the consequences of mutation. This improved insight, 
together with the innovative technologies for the generation and 
screening of large libraries, may soon result in the creation of 
new, highly efficient proteases possessing a broad range of use- 
ful properties. 
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-- To contribute to the development of the transcrip- 

-Uon map of human chromosome 21 (HC21), we have 
" used exon trapping from pools of HC21-9pecific cos- 
V-mids. Using selected trapped exons, we have identified 
:-^a novel gene (named TMPRSS2) that encodes a mul- 
^ timeric protein with a serine protease domain. The 
..;TMPRSS2 3.8-kb mRNA is expressed strongly in small 

intestine and weakly in several other tissues. The fuU- 
1 length cDNA encodes a predicted protein of 492 amino 
r acids that contains the following domains: (i) A serine 

protease domain (aa 255-492) of the Si family that 
Z probably cleaves at Arg or Lys residues, (ii) An SRCR 

(scavenger- receptor cysteine-rich) domain (aa 149- 
'~ 'Z42) of group A (6 conserved Cys). This type of domain 
-. * is involved in the binding to other cell surface or extra- 

V cellular molecules, (iii) An LX)LRA (LDL receptor class 
>^A) domain (aa 113-148). This type of domain forms a 

binding site for calcium, (iv) A predicted transmem- 
*:i>brane domain (aa 84-106). No typical signal peptide 
r was recognized. The gene was mapped to 21q22.3 be- 

V tween markers ERG and D21S56 in the same PI as MXl. 
gThe physiological role of TMPRSS2 and its involve- 
:?:&ient in trisomy 21 phenotypes or monogenic disorders 
:^that map to HC21 are unknown. C 1997 Academic Pms 



INTRODUCTION 



Hiunan chromosome 21 (HC21) is the smedlest chromo- 
'^pine, with a long arm (21q) of aroimd 40 Mb, containing 
Approximately 600-1000 genes (reviewed in Ajntonarakis, 
[993), and a short arm (21p) of arovind 10-15 Mb, which 

^Sequence data from this article have been deposited with the Gen- 
^ank Data Library under Accession Nos. U75329 (cDNA) and 
^229, X88228. X88321, X88043. and X88047 (trapped exons). 
g-.To whom correspondence should be addressed at Division de G6n- 
rPque M^dicale, Centre Medical Universitaire, 1 rue Michel-Servet, 
^1 Genfeve 4, Switzerland. Telephone: 41.22-7025707. Fax: 41-22. 
025706. E-mail: Stylianos»Antonarakis€hnedecine.unige.ch. 



is highly homologous to those of the other four human 
acrocentric chromosomes. To date» about 75 HC21 genes 
have been cloned and partially ch8u*acterised [(^enome 
DataBase, httpO/gdbwww.gdb.org, and SWISS-PROT, 
httpy/ww\v.expasy.ch]. Trisomy for human chromosome 
21 is the most common chromosomal abnormality at 
birth, leading to the phenotypes known as Down syn- 
drome (Epstein, 1989). In addition, the loci for several 
monogenic disorders have been mapped to HC21. Dense 
linkage maps and almost complete physical maps of 21q 
have already been obtained and are now extensively 
used for the characterization of HC21 genes and the ef- 
forts to determine the nucleotide sequence of HC21. The 
cloning and characterization of HC21 genes are a neces- 
sary step for the understanding of Down syndrome and 
the molecular etiology of monogenic disorders mapping 
on this chromosome. 

In our laboratory, systematic exon-trapping experi- 
ments have been performed to identify portions of 
HC21 genes, clone and characterize the corresponding 
full-length cDNAs and genes, and participate in the 
international effort to create a transcription map of 
HC21 (Chenge? a/., 1994; Peterson e^ a/., 1994;Tassone 
et aL, 1994; Lucente et al., 1995; Chen et qL, 1996). We 
report here the cloning of a novel serine protease gene 
(TMPRSS2), which is expressed mainly in the small 
intestine, but also in lower levels in several other tis- 
sues, and which maps to 21q22.3. The predicted poly- 
peptide of TMPRSS2 also contains a transmembrane 
domain, a scavenger receptor cysteine-rich (SRCR) do- 
main, and an LDL receptor class A (LDLRA) domain, 
and it probably belongs to the type II integral mem- 
brane proteins. The TMPRSS2 gene is homologous to, 
but different from, the human enteropeptidase gene, 
which maps to a different region of HC21 (21q21). 

MATERIALS AND METHODS 

Exon Trapping 

Pools of chromosome 21.specific cosmids from the LL21NC02 li- 
brary (kindly supplied by P. de Jong) were used in exon-trapping 
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experiments (Buckler et al„ 1991; Church et at., 1994; Gibco BRL 
Manual 18449-017). EcoRl- and Psfl-digested cosmida were sub* 
cloned into pSPL3 vector, and plasmid DNA was used to transfect 
Cos? mamroalian cells using UpofectACE (Gibco BRL). Total RNA 
was isolated from Cos? cells 24 h after transfection, cDNA was syn- 
thesized, and PGR products were subcloned into pAMPlO vector by 
UDG (uracil DNA glycosylase) cloning. After elimination of cryp- 
tically spliced, pSPL3*derived clones by oligonucleotide screening, 
the inserts of individual pAMPlO clones were subjected to nucleotide 
sequencing on an ABI373A automated sequencer by dideoxy texrni- 
nator fluorescence method using Tag polymerase- Nucleic acid and 
amino acid homologies of the resulting sequences were analyzed 
through BLASTN and BLASTX searches of the nonredundant data- 
base (Altschu) et aL. 1990). 

Cloning of TMPRSS2 cDNA 

The 216-bp PGR product derived from trapped exon HMC26A01 
with oligonucleotide primers (26A01A. 5'-GCCTGCGGGGTCAAC- 
TTGAAC-3', and 26A01B. 5'-GGCGGCTGTCACGATCCACTC-3') 
was used as a probe to screen approximately 500,000 clones of a 
human heart \gtlO cDNA library (Clontech HL.3026a). One positive 
clone (APGl) was isolaUd, and the 2.4-kb insert was subcloned into 
the pAMPlO vector and sequenced in both directions using standard 
oligonucleotide walking protocols for the ABI373 automated se- 
quencer. The nucleotide sequence was verified using RT-PCR prod- 
ucts from intestine poly(A)* mRNA. 

Chromosomal Mapping 

Two independent methods were used to assign TMPRSS2 to a 
human chromosome. First, PGR amplification of the trapped exon 
HMC26A01 with specific oligonucleotide primers (26MAP1, 5'-GAG- 
GCrrTCTGCAGCTTCATC-3', and 26MAP2, 5'-CAATCGATGGCA- 
TTGGACGG-3') was performed on the genomic DNA from a panel 
of somatic cell hybrids with defined segments of HC21. Second, the 
insert of the initial trapped exon HMC26A01 was used to probe high- 
density filters of cosmids from the HC21-specific LL21NC02 library. 
Finally, PGR amplification using either oligonucleotide primers 26 
MAPI and 26 MAP2 or 26A01A and 26A01B was used on DNAs from 
a panel of HG21-derived YACs. 

5'- and 3* 'RACE (Rapid Amplification ofcDNA 
Ends) 

To obtein the 5' end of the TMPRSS2 cDNA, 5'-RACE was per- 
formed on human small intestine cDNA, From 1 prg of poly<A)* RNA 
(Glontech 6547-1) cDNA was made with the Marathon cDNA Ampli- 
fication kit (K-1802-1). and 5'-RACE using nested PGR primers was 
carried out with the enzyme Tag Expand High Fidelity (Boehringer 
Mannheim) according to the manufacturer's protocol. The gene-spe- 
cific primers were 26A01B (see above) and AP26BB (5'-CCGCTG- 
TGATGCACTATTCC-3 ' ). In two different experiments the same 
PGR product of 670 bp was generated and subjected to nucleotide 
sequencing. 3 '-RAGE was carried out using gene specific primers 
AP26G (5'-(XlTTCrrGGCTGTGCGAAAGC.3') and AP26K (5'-GTG- 
TGGCTTTGGCACTCTCTGC-3'), and a PGR product of approxi- 
mately 2.0 kb was generated. 

Northern Blot Analysis 

The cDNA clone APGl containing the complete coding sequence 
was used to probe two Northern blots, each containing poly<A)* RNA 
from eight human adult tissues (Glontech 7769-1, Clontech 7760-1), 
and one containing four fetal tissues (Clontech 7756-1). Northern 
Blot analysis was performed using standard protocols, with high- 
stringency washing. A control hybridization using a human actin 
probe was used for determination of the amount of RNA loaded in 
these Northern blots. 

Comparative Protein Modeling 

The sequences of both LDLRA and protease domains of TMPRSS2 
were submitted to the S\MSS-MOO£L 'automated comparative pro- 



tein modeling server (Peitech, 1995, 1996). The models were 
as follows: 

LDLRA domain. SWISS-MODEL could not automatically 
vide a 3D structure of this domain since the degree of identity' 
the most similar sequence of known 3D structure was less than _ 
Using BLAST (Altschul et aL, 1990), we identified the Brookh^ 
Protein Data Bank entry ILDL (NMR structure of the LDLRx 
main) (Daly et al., 1995) as the suitable modeling template. We t 
aligned the TMPRSS2 LDLRA domain with the sequence of 
and submitted the sequence alignment to SWISS-MODEL using'{^ 
Optimise mode. 

Serine protease domain. This domain was modeled using ^t}^ 
First Approach mode of SWISS-MODEL, which provides fully ai^' 
mated template identification and multiple sequence alignment pii^ 
to model building. Chymotrypsin (P17538) was identified as a 8ugl§ 
able modeling template. The template and TMPRSS2. protease 
quences were automatically aligned and the model generation pi^i 
ceeded to the end without human intervention. Sequence to structuri^ 
fitness analysis using both 3D- ID profiles (Lathy et aL, 1992) ajf^ 
Prosall (Sippl, 1993) did not show any obvious discrepancies. The^ 
coordinates of both the LDLRA and the serine protease domaio of*^ 
TMPRSS2 can be found in the SWISS-MODEL Repository (http^'l 
www.expasy.ch/swissmod/swmr- top. html). ^ 
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Exon Trapping Identified a Clone with Homology to -i j 
Human Proteases -^j^ 

To clone partial gene sequences from human chronw^^ 
some 21 we have used pools of cosmids (from the; 
LL21NC02-Q library) in an exon-trapping experimenij;: 
and have identified more than 550 different potentii^/ 
exons (Chen et aL, 1996). One trapped sequenil 
HMC26A01 (GenBank X88229) of 216 bp showed a; 
strong homology to a large list of serine proteases from ; 
human and other species. BLASTX analysis, for exam' 
pie, revealed a 55% amino acid identity to human'! ^ 
prostasin (GenBank L41351; P = 1.3e-15). Other re|^;l|p;;. 
resentative homologies included human elastase^J 
(P08218), Erinaceus europaeus plasminogen (U33171V-: 
and pig human coagulation factor IX (P16293). BecauS; 
this HMC26A01 trapped sequence was probably d^^^ 
rived from a undescribed human serine protease, wei;: 
set out to clone and initially characterize the full-lengthy. 
cDNA of the corresponding human gene. 

Isolation of Full-Length TMPRSS2 Coding Sequences^c^ 

Clone HMC26A01 was used to screen approximately. 
500,000 clones of a human heart \gtlO cDNA library^ 
(this library was chosen because of the expression pat^ 
tern in Northern blots; see below). One positive 
(APGl). containing a 2.4-kb-long insert, was obtain^ 
subcloned into the pAMPlO vector, and subjectedjw| 
nucleotide sequence. 5'-RACE from. intestinal mTO^ 
(again chosen because of the expression pattern) usig?| 
oligonucleotides close to the 5' end of the APGl clop^ 
extended the 5'UTR sequence by about 150 n^^^, 
tides. Sequence analysis from both stremds reveal 
an open reading frame of 492 amino acids star 
from the most N-terminal methionine codon. 
3'UTR from the original clone APGl was app 
mately 0.95 kb. Figure 1 shows the complete nucleo] 
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. .agggcacctctcccccgctctctctgcaas /TGCGCAGCAA AATCGGTGTS/ a^gastcagcctcaaccccgggoogggacc. . . 

Clio V149 

. .aacccatggacaaccctccccctcgcgcas /TTCGCCTCTA TSGCCTATAA/ a£gagcacggggcagcacccgccgagcgac. . . 

RISO K191 

. .cgcgaccagaactccccgccccccccgcoa /TCATCCCTOT tctttaccct/ a£acaggcaagttcacccggagtcccccct . , . 



D229 



C241 



. .ctgagacactgagcccctcctcccccccoa /ACCTCTTAAC ACTTTCAACG/ g^acgcgcggcccaggcccggcaagcaggc . . . 

P301 D959 

. .ggcccactgcgccccccccccccgaaacas /ACCTACTGAA GACGACAAAG/ gtQaorqctQcccctQqQcacacaggactgc . . . 

G391 

, . tgggagcccaacaagtccccccgtccccftg /CGAAGACCTC TTCrrGCCAG/ ofcaacccaacacctccaccccaccctcggcc . . . 

K393 0436 

. .crgccccccgcaccccgccgcgccccacas /GGTGACAGTG ATGAAGGCAA/ aj;aaccatcctgccccccccctgactgtgct . . . 

G439 N491 

. .caccccctcccctcccacccgaacaggcAfl /ACCGCj^aatccacacggccctcgcccccgacgccgp (3UTR) . . . 

G492 • 

FIG. 2. Intron/exon junctions of the TMPRSS2 gene as determined by comparison of the cDNA sequence to the publicly 
sequences of the human Pi clone 35-H5-C8 (Martin et ai., 1994; Genbank Accession Nos. L35675-L35682). 
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and predicted amino acid sequence of TMPRSS2. This 
cDNA was verified by RT-PCR amplifications from in- 
testinal RNA using pairs of oligonucleotide primers 
from the cDNA sequence. Interestingly, no ESTs iden- 
tical to portions of the TMPRSS2 cDNA sequence were 
identified in the dbEST database of GenBank (search 
of February 18, 1997). A number of additional exons 
from the Chen et aL (1996) study were identical to 
portions of the TMPRSS2 cDNA, including HMC44E11 
(GenBank X88043), HMC26A05 (GenBank X88228), 
HMC19A07 (GenBank X88321), and HMC44D02 
(GenBank X88047). 

Intron / Exon Junctions 

Homology searches with sequences available in the 
public databases revealed identity of discontinuous re- 
gions of the TMPRSS2 cDNA with portions of human 
PI clone 35-H5-C8 which was sequenced by Martin and 
co-workers (Martin et aL, 1994; GenBank Accession 
Nos. L35675-L35682). The comparison of the cDNA 
sequence of TMPRSS2 with the genomic sequence of 
human PI revealed intron/exon junctions that are 
shown in Fig. 2. Not all such junctions are reported in 
the figure since the sequence of the entire PI clone was 
not available in the public databases. It is likely that 
there are additional introns 5' to codon 110 and be- 
tween codons 191 and 229 and codons 241 and 301. 

Mapping of TMPRSS2 to Chromosome 21 

PGR amplification was performed with oligonucleo- 
tide primers 26MAjPl and 26MAP2 on genomic DNA 
from rodent— human somatic cell hybrids that con- 
tained either single human chromosomes (NIGMS 2; 
Drwinga et aL, 1993) or specific segments of HC21 (Pat- 
terson et aL, 1993). The expected 155-bp PGR product 
was present in somatic cell hybrids WAV17, E7b, 725, 
2Furl, R50-3, GA9-3, 9528C-1, 188lC.13b, 8q-, ACEM 
2-lOd, JC6A, and 1x4; in contrast, somatic cell hybrids 



21q+, 6918-8al, and MRC2-G.did not show amplifica?:: 
tion (data not shown). These data localized this human"^ 
protease to the region 21q22.3 between markers EHG 
and D21S56 (Fig. 3). ^ 
We used exon HMC26A01 to probe a subset of the' 
cosmid library LL21NC02. One cosmid, Q20A3. waa 
identified as positive* PGR on this cosmid with the 
same primers 26MAP1 and 26MAP2 produced the ex£^ 
pected 155-bp fragment, confirming that Q20A3 conv 
tained this exon of TMPRSS2 gene. Yeast DNA from 
79 YAC clones, chosen to cover almost all of HC21 (Chu-' 
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FIG. 3. Schematic representation of the mapping position 
TMPRSS2 gene on chromosome 21 as resulted from PCR ampli?? 
tion of somatic ceU hybrids and sequence identities with a chi^ 
some 21 Pi clone (see Results). Representative results from.y 
amplification using oligonucleotide primers 26MAP1/26MAP2 ^ 
text) are also shown. 
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IfIG. 4. Northern blot analysis using the TMPRSS2 cDNA as hybHdization probe. The RNA filters are from Clontech (Cat. Nos. 7750- 
7760-1, 7759-1, and 7756-1) and contain 2 //g of poIy(A)* mRNA per tissue indicated. The thick arrow shows the 3.8-kb mRNA species* 
^ile the thin arrow depicts the faint 2.0-kb mRNA. 




^iakov et aL, 1992), was used for PGR amplification 
^jfyith the two pairs of oligonucleotide primers 26MAP1- 
126MAP2 and AP26G (5'-GGTTCTGGCTGTGCCAA- 
GC-3')-AP26H (5'-CCAATGTGCAGGTGGAGACC- 
3') in the 3 'UTR region. No positive YACs were identi- 
ied. Many single YACs in 21q22.3 from the collection 
"ofChumakovc/ a/. (1992) were also tested by PGR with 
^^hese primers and no amplification was observed. The 
Absence of positive YACs for this human TMPRSS2 
jiene suggests either that the HC21 contig (Chumakov 
*jH aL, 1992) in the region between markers ERG and 
P21S56 contains at least one gap or that the YAC 
'dones available to our laboratory have accumulated 
'deletions. 

As described above, discontinuous regions of the 
;TMPRSS2 cDNA were identical to portions of human 
^1 clone 35-H5-C8, which was sequenced by Martin 
Bad co-workers (Martin et aL, 1994; GenBank Acces- 
mon Nns. L35675-L35682). This PI also contained 
gene MXi, which maps to 21q22.3 in the interval be- 
tween ERG and D21S56 (Fig. 3). Therefore, this se- 
ence identity of TMPRSS2 with portions of PI 35- 
®i5-C8 is in agreement with the mapping position ob- 
ed using the somatic cell hybrids. 



orthern Blot Analysis 

The insert of cDNA clone APGl was used as a probe 
inst three filters containing 2 ^g of poly(A)* RNA 
m 16 h uman adult tissues and 4 h\iman fetal tissues, 
^hybridization signal corresponding to an mRNA spe- 
approximately 3.8 kb was detected (Fig. 4). The 



difference between the 2.4-kb cDNA clone APGl and 
the 3.8-kb RNA species detected in the Northern blot 
is probably due to the continuation of the 3 'UTR down- 
stream of the end of clone APGl. 3 '-RACE from intes- 
tine RNA using oligonucleotides from clone APGl (oli- 
gonucleotide primers AP26G, see above, and AP26K 5'- 
GTCTGGCTTTGGCACTCTCTGC-3') revealed a PGR 
product of approximately 2.0 kb, which corresponds to 
a mRNA length of 3.8 kb, compatible with the results 
of the Northern blot analyses (data not shown). The 
highest level of expression was observed in small intes- 
tine, but this gene is also expressed in human adult 
heart, placenta, lung, th\Tnus, and prostate and in fetal 
brain and liver. Another weakly hybridizing mRNA 
species of 2.0 kb was also observed in several tissues. 
This could be due to alternative splicing, utilization of 
different transcription start sites and polyadenylation 
signals, overlapping transcripts, or, most likely, cross- 
hybridizing transcripts with sequence homologies with 
TMPRSS2. A human actin probe was used to control 
the amount of RNA loaded (data not shown). The ex- 
pression of the TMPRSS2 gene appears to be develo- 
pentally regulated since there is strong expression in 
fetal brain but very little expression in adult brain. In 
addition, in the lung, expression is high in the adult 
tissue but low in the fetal tissue. 

Type II Transmembrane Protein 

Protein prediction programs, which predict trans- 
membrane domains, including httpy/ulrec3.unil.ch/ 
soft ware/TMPRED_form. html (Hofmann and StofFel, 
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^G. 6. Schematic representation of the different domains of TMPRSS2. Numbers correspond to codons of the full-length cDNA shown 
tt^* 1. For description of the domains see text. 
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U: i993)» suggested that amino acids 84-106 of TMPRSS2 
^were hydrophobic and likely to be a transmembrane 
.^•domain (Figs. 1 and 5). This hydrophobic sequence is 
-pot preceded by a recognizable leader sequence. These 
^findings are compatible with a type II integral mem- 



^-brane protein in which the amino- terminus is at the 
^cytoplasmic side of the membrane (Parks and Lamb, 
•^4993)- These features (a type II integral membrane 
^ipolypeptide with an extracellular protease domain) are 
tH^similar to those of mammalian hepsins (Lreytus et aL, 
?^1988; Tsuji et aL, 1991). This latter protein is important 
for cell growth and maintenance of normal cell mor- 
phology (Kurachi et aL, 1994); however, the underlying 
mechanisms for the biological activities are unknown. 

t\LDLRA Domain 

V In addition to the transmembrane domain, TMP- 
\ RSS2 contains a protein motif of the so-called LDLRA 
lOow-density lipoprotein receptor A) domain extending 
l^-from Cysll3 to Cysl48 (Figs. 1 and 5). This structural 
^ motif (PDOC00929; http^/www.expasy.ch/cgi-bin/get- 
^ prodoc-entry?PDOC00929) was found in the low-den- 
sity lipoprotein receptor gene, which contains seven 
^ successive such domains (Sudhof aL, 1985). A typical 
LDLRA domain is about 40 amino acids long and con- 
tains 6 disulfide-bound cysteines (cysteines 113, 120, 
S 126, 133, 139, and 148 in TMPRSS2). Similar domains 
5 have been found in both extracellular and membrane 
u; proteins, including the VLDL receptor; gp330; Dro- 
^jisophila putative vitellogenin receptor; human entero- 
1 kinase complement factor I; complement components 
.;:C6, C7, C8, and C9; perlecan; PKDl; and vertebrate 
-0 integral membrane protein DGCR2/IDD (Daly et aL, 
4 1995). The amino acid comparison of the single LDLRA 
V domain of TMPRSS2 with other similar domains is 
,3:^ shown in Fig. 6a. The predicted 3D structure of this 
-;;.domain and its comparison with the first such domain 
/of the LDLR is shown iri Fig. 7a. The LDLRA domains 
.Oform the binding site for LDL and calcium; the acidic 
^^;residues between the fourth and the sixth cysteines are 
A::iinportant for high affinity-binding of positively 
^charged sequences in LDLR ligands (van Driel et aL, 
1987; Mahley. 1988). 

SRCR Domain 

An SRCR domain (Resnick et aL, 1994) was also iden- 
ed in TMPRSS2 extending from Vall49 to Leu242. 
SRCR domains are approximately 100 amino acids long 
d rich in cysteine. The overall consensus sequence 
erived from more than 40 such domains from different 
n>teins revealed a consensus sequence at 41 of 101 
idues (Resnick et aL, 1994). Two groups of SRCR 
Qi£uns are recognized, group A and group B, differing 




in the number of conserved cysteines. The SRCR do- 
main of TMPRSS2 contains the pattern compatible 
with group A SRCR. The sequence homology to differ- 
ent examples^ of group A SRCR domains is shown in 
Fig. 6b. The SRCR domains were first found in type I 
macrophage scavenger receptor (Freeman et aL, 1990) 
but subsequently in many other sequences (for a com- 
prehensive list, see Resnick et aL, 1994). The SRCR 
domain is reminiscent of but different from immuno- 
globulin domains. Proteins with SRCR domains are ei- 
ther at the cell surface or secreted into plasma or other 
body fluids. Some proteins such as the WC 1 antigen or 
M130 contain nine or more such domains while others 
such as the MSR (macrophage scavenger receptor type 
I) and the secreted CFl (complement factor 1) or 
cyclophilin C contain only one domain. The biochemical 
fimctions of the SRCR domain have not been estab- 
lished with certainty; however, most of these domains 
are involved with binding to the cell surface of extracel- 
lular molecules. 

Protease Domain 

The most striking feature of the TMPRSS2 predicted 
polypeptide is its similarity with members of serine 
protease family of proteins. The serine protease domain 
extends from amino acid residue Arg255 to the car- 
boxyl-terminus of the predicted polypeptide. There is 
approximately 45—55% identity with several members 
of the serine protease family; the best similarities are 
with human hepsin (X07002), human enterokinase 
(P98073), and human kallikrein (P03952). The features 
of the protease domain of TMPRSS2 are compatible 
with the Si family of the SA clan of serine-type pepti- 
dases as characterized by Rawlings and Barrett (1994). 
The prototype of this family is chymotrypsin and the 
3D structure of some of its members has already been 
resolved. For a comprehensive list of the Si serine-type 
peptidases see SWISS-PROT (http://www.expasy.ch/ 
cgi-bin/lists?peptidas.txt). TMPRSS2 exhibits conser- 
vation of serine protease sequence motifs (Fig. 6c); in 
particular, the active site residues can be identified as 
His296, Asp345, and Ser441. TMPRSS2 is predicted to 
cleave after Lys or Arg residues since it contains 
Asp435 at the base of the specificity pocket (SI subsite) 
that binds to the substrate. The predicted 3D structure 
of the protease domain of TMPRSS2 is shown in Fig. 
7b. The protein model was built using the SWISS- 
MODEL server for automated comparative protein 
modeling (Peitsch, 1995, 1996) as described under Ma- 
terials and Methods. It is of interest that TMPRSS2 
is highly homologous to hepsin, another protease that 
contains a transmembrane domain and is thus a type 
II integral membrane protein with its protease domain 



G. 7. (a) Ribbon model of the LDLRA domain of TMPR5S2. The NMR structure of the LDL receptor A domain is depicted in blue 
e the TMFRSS2 LDLRA homology domain is shown in red. The three disulfide bonds are shown in yellow, (b) Ribbon model of the 
.tease domain of TMPRSS2. The full protein structure is depicted as a gray ribbon, while the active sites are shown with colored residues 
^96, blue; Asp345, red; Ser441, green). The side chain of Asp435, which determines the I^ys/Arg specificity of the TMPRSS2 proteasje, 
Own in red. The three disulfide bonds are depicted in yellow, while two iree cysteines are shown as orange bars. 
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in the extracellular space (Karachi et aL, 1994; Leytus 
et aL, 1988; Tsuji et aL, 1991). TMPRSS2 contains nine 
conserved cysteine residues which by homology to other 
proteases most likely form the following intrasub- 
unit disulfide bonds Cys826-Cys842, Cys926-Cys993, 
Cy6957-Cys972, and Cys983-Cysl011 and the inter- 
Bubunit disulfide bond involving Cys758-Cys912 which 
probably joins the catalytic protease subuhit with the 
nonprotease part of the polypeptide. The protease do- 
main does not contain potential N-glycosylation sites 
while the remainder of the predicted polypeptide con- 
tains two such potential sites (N213» in the SRCR do- 
main, and N249). The amino-terminal He of the prote- 
ase domain is preceded by Arg in the context of a pep- 
tide sequence Arg-He-Val-Gly-Gly (RIVGG), which is 
typical for the proteolytic activator site of many serine 
protease zymogens (Rawlings and Barrett, 1994). The 
potential cleavage between Arg and lie, which would 
be similar to the activation mechanism of other serine 
protease zymogens, would convert TMPRSS2 to an ac- 
tivated form consisting of a nonprotease and a protease 
catalytic subunit linked by a disulfide bond that most 
probably involves Cys758 and Cys912. 

DISCUSSION 

In this paper we describe the cloning, chromosomal 
mapping, and initial characterization of a novel gene 
that maps on human chromosome 21q22.3 and encodes 
a polypeptide with multiple recognizable domains, 
namely LDLRA, SRCR, and serine protease domains. 
In addition, the presence of a transmembrane domain 
and the absence of a signal peptide suggest that this is 
a type II integral membrane protein. More biochemical 
experiments are necessary to further characterize the 
cellular localization of this protein and its physiological 
function. The biochemical events for the activation of 
the probable serine protease activity are unknown but 
are likely to be similar to those described above. It is of 
interest that the predicted TMPRSS2 protein contains 
additional domains (LDLRA and SRCR) that are poten- 
tially involved in binding with extracellular molecules 
or the cell surface. The molecules that are cleaved by or 
that bind to TMPRSS2 are unknown. There are several 
tissues that are shown by Northern blot analysis to 
express the TMPRSS2 gene. The site of the strongest 
expression is the small intestine; however, other tis- 
sues including heart, lung, and liver also showed a sig- 
nificant amount of TMPRSS2 mRNA. The fiinction of 
this protein in these tissues remains elusive. 

Are there any monogenic disorders associated with 
the TMPRSS2? Several monogenic phenotypes due to 
mutations in unknown genes have been mapped by 
linkage analysis to chromosome 21q22.3; these include 
APECED (Aaltonen et aL, 1994; OMIM 240300), an 
autoimmune disorder, ' two forms of autosomal reces- 
sive deafiiess (Bonne-Tamir et aL, 1996; Veske et aL, 
1996; OMIM 601072); Knobloch syndrome (Sertie et aL, 
1996; OMIM 267750); one locus for manic depressive 
illness (Smyth et aL, 1997; OMIM 125480); and one 



locus for holoprosencephaly (Muenke et aL, 
OMIM 236100). All of these phenotypes are mapn 
more distal to TMPRSS2, and it is therefore unlil^ 
that TMPRSS2 is a candidate gene for any of thl^ 
disorders 

Many human disorders are due to deficiency of otS 
serine proteases. For example, deficiencies of coa 
tion factors such as Factor XII (OMIM 234000). Fa' 
X (OMIM 227600), Factor IX (OMIM 306900), and Pi 
tor VII (OMIM 227500) belong to these disorders. Ad' 
tional examples of such disorders are enterokinase 
ficiency (Hadom et aL, 1969; OMIM 226200), tryp 
gen deficiency (Townes, 1965; OMIM 276000), 
hereditary pancreatitis due to mutations in the catio 
trypsinogen gene (Whitcomb et aL, 1996). The genei^ 
tion of mice with targeted disruption of the moiise 
TMPRSS2 gene will enhance our understanding oTSS 
function of this gene and will provide candidate phenyl 
types for further investigation. 

Is the overexpression of three copies of the TMPRSS2^ 
involved in one of the phenotypes of Down syndrome? 
TMPRSS2 maps outside the so-called Down syndrome^ 
critical region (DSCR; between markers D21S17 aiidl 



ETS2), triplication of which is associated with many^ ^^J *^' 



con 



phenotjTJes of Down syndrome (Delabar et aL, 1993)!^ 
However, the existence of a single DSCR has recently^ 
been challenged since rare patients with proximal tri^? 
somy 21 not including the D21S17-ETS2 region dii^; 
played some of the phenotypes of Down syndrome (Kpr-^ 
enberg et aL, 1994). In addition, a wider region fix>m;| 
D21S17 to and including MXl was associated with sev^ 
eral phenotypes, including the heart defect and somij^ 
dysmorphic features of the syndrome (Delabar et al,^ 
1993; Korenbergci aL, 1994). Since the TMPRSS2 gene,^ 
is within this interval it is formally a candidate fpjy^ 
some phenotype(s) of Down syndrome. Transgenic mice^^ 
that overexpress the murine extracellular protein urp:?^ 
kinase-type plasminogen activator have been shown^ 
to exhibit abnormal phenotypes (learning disabilities)^ 
(Meiri et aL, 1994). The study of transgenic mice thatj 
overexpress the murine homologue of the human 
RSS2 gene may contribute to the understanding 
potential involvement of this gene in the pathogenesifl^ 
of Down syndrome. A mouse model with partial trisomy^ 
16 (which corresponds to a partial human trisomy 21^ 
from APP to MXl) has recently been made (Reeves 
aL, 1995). It would be of interest to know if the murine 
homologue of the TMPRSS2 gene is included in tt 
triplicated part of mouse chromosome 16 
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The use of htgh*throughput screening for early stage dnjg 
discovery cmposes several constraints on the format of assays 
for therapetitic targets of interest. Homoger^eous celHree 
assays based on energy, transfer, fluorescence polarization 
spectroscopy or fluorescence correlatbn spectroscopy 
provide the sensrtivtty, ease, speed and resistance to 
interference from test compounds needed to function in a 
high-throughput screening mode. SimDaily, r>ovel ceU*based 
assays are now being adapted for high-throughput screening, 
providing for in aitu analysis of a variety of bioIogicaJ targets. 
Ftrfaily, recent advances in assay miniaturization maifc a 
trartsition to ultra high-throughput screening, ensuring that 
identification of lead compounds wiU not be the rate-limiting 
step in finding new drugs. 
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Introduction 

Continuing advances in molecular biology, human genetics 
and genomics have accelerated identification of the mech- 
anisms underlying a growing number of human diseases. 
This progress has increased the number of novel protein 
rargecs available for potential therapeutic intervention 
by drug treatment. Concurrently, novel approaches in 
combinatorial chemistry and expanded collections of 
natural products have dramatically increased the number 
of compounds that can be tested for activity against these 
targets. The confluence of these two trends towards more 
potendal targets and larger chemical libraries has gready 
stimulated adoption of high-throughput.screening (HTS) 
as the primary tool for early stage drug discovery. 

HTS is the process by which large numbers of compounds 
are tested, in an automated fashion, for activity as 
inhibitors or activators of a particular biological target, such 
as a cell surface receptor or a metabolic enxyme. Although 
any assay performed on the bench top can, in theory, 
be applied in HTS, conversion to an automated format 
imposes certain constraints that .affect the design of the 
assay in practice. Procedures that arc routine at the bench 



are often extremely difficult to automate. Also, the more 
steps required for an assay, the more difficult to automate 
the HTS. The ideal assay is one that can be performed in a 
single well with no other manipuladon other than addidon 
of the sample to be tested. 

A number of assay formats have been developed or 
modified over the past few years to conform to the 
constraints imposed by HTS. These assay protocols can 
be divided into two groups; cell-free assays that measure 
the biological acdvity of a relatively pure protein target and 
cell-based assays that assess the acdvity of a target, protein 
by monitoring a biological response of a cell in which the 
target protein resides. In cither case, the protocols require 
minimal cnanipuladons, can be performed robodcally in 
relatively small volumes, yield robust responses and are 
reladvely impervious to perturbadon by solvents and 
compounds used in drug screening. In this review we 
■ describe several of the more recendy developed or 
exploited assay protocols for HTS. 

Cell-free assays 

The primary goal in adapting cell-free assays to HTS is 
to minimize the number of steps required in setting up 
the assay and in detecting the acdvity, be it an enzymatic 
reacdon or the binding of two components. This goal has 
been met to a large extent by development of detection 
systems that do not require separation of the product of the 
reacdon from substrate* or from other components of the 
assay mixture. Earlier approaches to such homogeneous 
assay formats relied on proximity-dependent energy- 
transfer. The output of such assays derived from the 
signal enhancement generated by bringing a source and a 
distance-dependent amplifier close together. For example, 
the P-particles of a low-eneigy radionuclide attached to 
a ligand will sdmulate the fluorescent emission of a 
scintillant in a bead to which the ligand's receptor is 
attached [1,2]. More recendy, this detection method has 
been applied to enzymatic reactions, such as that catalyzed 
by topoisomerasc I [3]. As another example of energy 
transfer assay formats, the rare earth metal lanthanide, 
Eu^*-; when irradiated by light, can transfer its excitadon 
energy in a nonradiadve process to the fluorescent protein, 
allophycocyanin, if the two are in close proximity. This 
can occur when a £u^'*'-derividzed ligand binds to an. 
allophycocyanin-linked receptor [4,5] or a Eu^^'-derivitized 
and-phospho tyrosine antibody binds to a detector-linlted 
~pho3phorylated substrate of a tyrosine kinase such as sro 
[6*]. Use of dme resolved fluorescent procedures assessing 
emission at specific times following excicadon enhances 
the sensidvity of this technique, by reducing interference 
from background' fluorescence, from test compounds or 
from assay components [6*,7*]. Finally, enzymadc assays 
suitable for HTS and based on fluorescent resonant energy 
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transfer becwccn two dtfferenc forms of green fluorescent 
protein (GFP) have recently been described I8*I. 

A number of tnvesctgacors have exploited fluorescence po- 
larization spectroscopy (FPS) as the basis for homogeneous 
HTS assays of both enzymatic and binding reactions. 
When fluorescent molecules in solution arc excited with 
polarized Jight, the degree to which the emitted light 
retains polarization depends on the extent to which the 
fluorescent molecule rotates during the interval between 
excicadon and emission. The rapid roiadon of small 
fluorescent molecules in soluuon results in substantial 
loss of polarizauon. If such small molecules bind to 
larger molecules, their rotadonaJ diffusion is reduced and 
the retenuon of polaiization is correspondingly increased. 
Thus, by measuring the relative intensity of emitted light 
in the planes normal and orchogorial to the plane of the 
incident polarized light, the extent of rotation of a target 
molecule, and inferentiall); the extent of binding of the 
target molecule to a larger component, can be calculated. 
For instance, fluorescent polarization has been used to 
detect the presence of specific drugs or hormones [9,10J, to 
assess antibody binding of fluorcscein-conjugaced pepddes 
{1 1) or to monitor DNA: DNA hybrid formadon [12 J. The 
recent availability of a 96-welI plate reader (13J with a 
high sensidvity to fluorescein and fluorescein conjugates 
has. allowed developnient of 96- we 11 based fluorescent 
polarizadoh assays; Such :high-throughput assays for src 
family tyrosine kinase acdvity [I4*h for binding /of 
phosphopeptidci to Src SH2 domains [15*), for interacdon 
between STAT I and an Y*interferon receptor-derived 
phosphotyrosine-coniaining pepdde [16»] and for speciiic 
protease acdviues (17,18»J have recendy been described. 
The sensidvity of fluorescence polarizadon, the ease and 
speed with which such assays can be run and the resistance 
of such assays to interference from absorpdve compounds 
commonly present in complex mixtures [IS*] make this 
procedure highly amenable to HTS. 

Fluorescence correlation spectroscopy, (FCS) represents 
another recendy developed detecdon format eminently 
suitable for HTS. FCS measures differences in physical 
states of a target molecule, such as bound versus free 
or cleaved versus intact, in a homogeneous mixture 
(19], SpecificaUy, FCS measures the burst of fluorescent 
emission of ,a molecule passing through a small volume of 
space, which is defined by a sharply focused laser beam. 
Small molecules diffuse through the volume rapidly and 
thus yield short bursts of light. Binding of these small 
molecules to larger molecules reduces their transladonal 
diffusion and correspondingly increases the durauon of the 
bursts of light. Deconvoluuon of the emission patterns 
in a sample . by appropriate software can yield the 
reladve amount of the bound and . unbound states of a 
fluorescently tagged ligand. This technology can therefore 
readily be applied to measure receptor— ligand interacdons, 
DNA-procein interacdons, nucleic acid hybrid formadon 
and certain enzymadc reacdons [20]. 



Cell-based assays 

Cell-based assays arc an increasingly attractive altcmauve 
to ta vuro biochemical assays for HTS. Such in x/rvo assays 
require an ability to examine a specific cellular process 
and a means to measure its output. For instance, agonist 
acdvarion of a cell surface receptor or a ligand-gated 
ion channel can elicit a change in the cranscripdon 
pattern of a number of genes. This ligand-induced 
alteradon in transcription can be readily captured by using 
gene fusions, in which a promoter element rcsporisivc 
to receptor acdvadon is fused to the coding region 
for an enzyme or protein whose levels can be easily 
measured. Appreciadon of the parucular signaling pathway 
associated with a specific receptor allows idendficadon of 
the appropriate transcripdonal response element required 
to detect a r^ponse. Figure I depicts a number of 
signal transducdon pathways, indicadng the transcripdonal 
response elements coupled to each pathway. Several 
reporter genes chat generate products that can be adapted 
to HTS format arc available [21,22]. These are listed in 
Table 1, with references to recent irmovadons in their use 
123»,24,2S,26»J. For instance, die recent report of novel 
fluorescent, cell-permeable substrates for ^-lactamase 
documents the use of p-laccamase to detect receptor 
acdvadon in single cells, making it an attracdve assay 
system for high density HTS [27**], 

While cell-based assays using reporter genes have proved 
effecdve as an HTS format, detecdng more immediate 
responses to target protein activation provides several ad- 
vantages, including shoner du radon of the assay and fewer 
false positives from nonspecific interacdons. As indicated 
in Figure 1, such cellular response dependent on acdvadon 
of a receptor include elevadon of a second messenger (for 
example, Ca?*, cAMP, inositol triphosphate), phosphory- 
ladon of an intermediate signaling protein, or subcellular 
translocadon of a signaling molecule. Recent advances in 
molecular biology and in instrumcntadon have made it 
possible to monitor these events in an automated format. 
For instance, the recent availability of a 96-well fluorescent 
imaging plate reader (Molecular Devices, Sunnyvale, 
California, USA) permits HTS of receptor acdvadon by 
monitoring Ca2* mobilizauon of ceils preloaded with a 
fluorescent calcium indicator, such as FLUO-3 (Molecular 
Probes, Eugene, Oregon, USA). In addidon, recombinant 
cells expressing a calcium-sensidve fluorescent protein, 
such as aequorin [28»J or a hybrid calmodulin-GFP protein 
(29»«J, obviate the need for preloading cells with dyes 
in order to detect calcium fluxes following stirouladon. 
A separate approach to detecdng early events following 
receptor sdmutadon involves examining relocalizadon of 
-specific components of the signal transducdon machinery. • 
For instance, MAP kinase (Figure 1) relocalizes from 
the cytoplasm to the nucleus within minutes following 
sdmuladon of an upstream G-protcin-coupled receptor 
[30,31]. Similady, fiarak ef aL [32«] fiave shown that 
recruitment of a 3-arTe3dr>-GFP fusion protein to the 
plasma membrane can be used to' monitor activadon 
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receptore ara functjonalty linked to the modulatton of aeveral wall charaeterizad enhancar/promotor atomenta, the cAMP raapense element (CRE). 
nuclear factor of activated T ceOa (NF-AT), NFicB, serum response element (SRE) and API (4S-49). Upon activation of a Qq. coupling receptor, 
adenylyl cydaae cs .stimulatod, producing increased concentrations of .intrBceQular cAMP, sttmutation of protein kinase A, phosphorylation of 
the CRE binding protein (CREB) and induction of promoters with CRE elementa. Q^^ coup&ng receptors dampen CRE activity by inhibition of 
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cascade (not shown). (b> Growth factor receptor (depicted by ellipoes) activation rosults In racruitinem of Sos (not shown) to the plasma 
msmbmna, where it stimulates Ras, which rocrutts the sertne/threenino kmase.Raf to the plasma membrsna. Once activated. Raf phosphorytatos 
MEK kinase, which phosphoiytatoi and aetivatea MAPK and the transcnptkMi factor ELK (Eta-lika protein, also known as p62 TCFt ttamary 
oomples factor 1]). ELK drives. transcription from pfomot sia with SRE elements, taaifing to synthesis of the transer^tion factors Fba ortd Jun, 
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of a number of different G-proccin-coupled receptors. 
Recent advances in mtcxoscopic imaging technology, in 
conjunccion wich software permicdng automated image 
recognition, provide a means to capture these events in 
a high -throughput mode. . 



Cell-based susays have significant advantages over ia xjitro 
assays. First; the starting material (the cell) self-replicates, 
avoiding the investment involved in preparing a purified 
target, in chemically modifying the target to suit - the. 
screen, and so on. Second, the taigets and ceadouts are ex- 
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Reporter genee (source) 


Advantages 


Disadvantages 


References 


O-galactosidase 
(bacteriaO 


WeO characterized; stable, tnexpensive 
substrates; hight/ sensitive fluorescent 
or chamiluminesceftt substrates avaflaUa; 
little interference from test compounds; 
simple readouts (readSy automated) 


Endogenous acthrity (mammalian 
cetls); tetramerie (rwn-linear 
response at low cortcentratior^ 


J23-50J 
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CKmeric; high specifio activity; no 
endogenous activfty Oow background) 


Requires addition of cofactor 

Ouciferin) and presence of O2 - 
and ATP 
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Ouanan placental) 


Secreted protein (avoids the rteed for 
membrane-permeable substrates); 
inexpensive colorimetrtc and highly 
sensitive (umineseent assays available 


Dtdogsnous activity in some caO 
types; optimal at pHM 


124,231 
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Monomeric; highly sensitive 
fluaiogenie substrates dsacr&ed; 
no erKtogenous activity 


Memtarane-permeafale 
fluorescent 8ut»8tratas not readify ' 
avaSable 


[27-J 


GFP QeOyfish) 


Monomeric; no substrate needed, (no 
manipulations requzrad for asa^; no 
endogenous activity; multiple forms 
avaBable 


Relatively low spedfic activity 


[26\31^2] 



amined in a biological context that more faithfully mimics 
the normal physiological situation. Third, cell-based assays 
can provide insights into bioavailability and cytotoxidcy. 
Mammalian cells arc expensive to culture and difficult 
to propagate in the automated systems used for HTS, 
however. 

An alternative to mammalian cell based assays is to 
recapitulate the desired human physiological process in a 
micro-organism such as yeast (33]. For instance, signaling 
via human G-protcin-coupled receptors has been reconsti- 
tuted in yeast to yield a facile growth response or a reporter 
gfnc readout ([34,35J; Klein «r unpublished data). 
Similarly! mammxUian ion channels have been coupled 
to growth response in yeast [361, Also, protein-protein 
interactions, including RAS-RAF associadon [37J and 
tyrosine kinase receptor-Iigand binding [38], have been 
faithfully reproduced using the yeast two-hybrid system. 
Finally, many mammalian transcription factors operate in 
yeast, including glucocorticoid receptor [39,401 and the 
retinoic acid receptor and retinoid X receptor families of 
receptors (411. The case and low cost of growing yeast, 
their ready genetic manipulation, and their resistance to 
solvents make yeast an attractive option for cell-based 
HTS. 

Miniaturization - — 

Several factors are fueling efforts to increase the speed 
of HTS and decrease the volume of individual reactions 
within an HTS format. Split-bead synthesis (sec Note 
added in prooO> or other similar approaches to combi- 
natorial chemistry dramatically increases the number of 
compounds that can be produced in a library but do so at - 
the cost of quantity of materiaL In addition, the limited 
supply of existing compounds within chemical libraries' 



of pharmaceudcal companies, aqd the growing number 
of targets against which such compounds can be tested, 
motivate a frugal approach to u»b of those compounds. 
Finally, the reagent costs associated with HTS, when 
muldplied by the increasing numli^r of assays per run/ arc 
bccotning a significant cost of eariy stage drug discovery. 

In response to these exigencies,, a number of groups 
have begun to develop formats ' for very high density 
screening using very small assay Volumes. One approach 
involves reducing the well size anid' increasing the density 
of the assay plate but retaining the overall assay format 
used in current 96-weU based HTS. Densides of 6500 
assays in a 10 cm array have beca reported for cell-free 
enzyme based assays (42*1 and ^for ligand binding in 
cell based assays t43»»l. This approach of miniaturizing 
exisdng formats significandy increases the number of 
assays per plate and the overall ttiihpughput of the screen 
but is intrinsically limited by die physical coristraints 
of deUveririg small volumes to wells, and of detecting 
responses in a sensidve and dmely manner. Accordingly, 
novel formats have been developed that eschew the 
assay format based on wells. Ont approach uses glass 
chips containing microchannels in which reagents, target 
proteins and compounds are herded by electro kinedc flow 
controlled by electric potendals applied at the ends of the 
channels (44«J. A related approach attains high-throughput 
both of chemical synthesis and acdvity assessment by 
-parallel arrays of three-<iimensional channels in which 
flow is controlled, by miniature hydrostauc actuators [451. 
These approaches provide significant reducdon in the 
volume of assays and a corresponding savings in reagent 
costs over convendonal HTS [451, In addition, with 
further development in parallel processing in multiple 
chips, the number of assays performed in a given period 
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of dme can increase dnimactcaUy. This movement co 
miniamrizacton is likely co ensure chat the initial stage of 
drug discovery idencificadon of lead compounds will noc 
be the race-ttmiung seep in finding new drugs. 

Conclusions 

The lasc decade has witnessed the emergence across che 
piiarmaceutical industry of the 96-weil-based, roboucs- 
driven, high-throughput screening process as the primary 
tool for idendf^ng acdve compounds in the first stage 
of drug discovery. This program has dictated the format 
of che assays chat are used to assess the acdvides of tar- 
gets — enzymes, receptors, transporters and so on — chat 
underlie drug discovery in various therapeudc areas. 
A number of. such formats — resonant enerigy transfer 
and fluorescent polarizauon spectroscopy in cell-based 
assays — have gained widespread acceptance and growing 
incorporauon into high-throughput screening programs. 
The growing number of potential therapeudc targets, 
the increasing number of screenable conipounds, the 
acceleradng coses of screening and the increasing pressure 
to generate more lead compounds in a shorter dme all 
conspire to render even the new approaches inadequate for 
meedng the andcipatcd throughput requirements, how- 
ever. Thus, we are likely to witness a movement towards 
even greater screening throughput by miniaturization and 
increased reliance on robodcs. Whether a new standard 
format for screening emerges in the near future, or whether 
a variety of formats are pursued concurrendy remains 
to be seen. Nonetheless, we can andcipate that the 
exigencies of drug screening will modvate a condnued 
application of state-of-the-art technologies co the process 
of high-throughput screening. 

Note added in proof 

For a reference describing split-bead synthesis, see [53]. 
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High-throughput screening: advances in assay technologies 

G Sitta Sittampalam**, Steven D Kahl*# and William P Janzent 



BoFth isotopic and nonisotopic assay methodologies are 
employed in highfthroughput screening for drug discovery. 
Recent advances in ceD-based and in vitro biochemicai 
assays will be reviewed, wHh epeciaJ emphasis on detection 
technologies amenable to automated *mix and read' 
procedures in high-throughput screenirtg, A major trend is 
the advent of homogertous assay systems which employ 
fluorescence resonance energy trartsfer, fluorescence 
polarization, es%d fluorescence correlation spectroscopy. 
CeO-based assay systems have also become popular 
in high-throughput screens in which active compourtds 
that directly modulate the disease target are identified. 
Colorimetric and amperometric methods have also been 
described recently, but are yet to be adapted widely in 
htgh-throughput screens. 
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OABCYL 4 *(4'-dimethyl'aminob«nzinaazo) benzoic acid 

EOANS &'(a'-aminoethy1)Bminonaphthalene euHonic acid 

FCS fluoreacanca correlatron spectraseopy 

RJPR fluoreacence imaging plate r^tiv 

FPA fluoreacer>ce polarizatton assay 

FRET fluoreacertce reaonance energy tranafer 

HTRF homogeneous time-resolved fluoreacence 

HTS htgh-throughput screening 

RET resonance energy transfer 

SPA scintillation proximity assay 

WQA wheat germ agglutintn 



Introduction 

The discovery of phamnaceutical agencs with novel 
structures and potencial therapeutic accivicy is a complex 
process. U usually begins with intensive studies of 
the physiological and clinical man ife sea dons of diseases, 
followed by the idenciBcauon of* relevant genes and/or 
associated biological targets for therapy. Recent advances 
in molecular biology and DNA sequencing techniques 
have made tremendous progress toward sequencing large 
genomes [1]. I( is anticipated that the sequencing of the 
entire human genome, which consists of —3000 megabases 
(over 100,000 genes), will be compleced in the early part 
of the next ccncury. Hence the identificauon of genes that 
determine the expression of biological targets associated 



with human disease is rapidly advancing, opening new 
and exciting opportunictcs for the discovery of life-saving 
drugs. 

Coupled with these advances are developments in com- 
binatorial chemistry, where large and structurally diverse 
chemical libraries are being generated at an unprece- 
dented rate using parallel synthesis {Z\. Innovations in 
powerful computers, automation and software technology 
have provided an ideal environment to test hundreds of 
thousands of compounds for biological activity, identifying 
active molecules or *hits* that can rapidly develop into 
potential drugs or *leads* with desired therapeutic activity. 

High-chroughput screening (HT^> is the process of testing 
a large number of diverse chemical structures against 
disease targets to identify 'hits*. Excellent introductions 
and reviews on high-throughput screening (HTS) have 
been published recently (3**,4*,S,6**1. Briefly, current 
scate-of-thc-arc HTS operations are highly automated and 
computerized to handle sample preparation, assay proce- 
dures and the subsequent processing of large volumes of 
data. Each one of these steps requires careful optimization 
to operate efficiently and screen 100—300,000 compounds 
in a month period. Hence a modem HTS operation 
is a multidisciplinary field involving analytical chemistry, 
biology, biochemistry, synthetic chemistry, molecular biol- 
ogy, automation engineering and computer science [5}. 

Central to the HTS process is an in xntrx> biochemical 
or cctl-based assay using a validated biological target 
representing a disease state. In this paper, we will focus on 
current assay technologies that are employed in HTS, with 
emphasis on their advantages and disadvantages. Develop- 
ing detection technologies with potential applicability to 
HTS will also be briefly reviewed. 

HTS Instrumentation and capabilities 

In genera], the instrumentation used in HTS assays should 
be accurate, reliable and easily amenable to automation. 
Analytical methods should be robust and reproducible, 
with stable reagents and signal responses. Signal-to-noisc 
(S/N) ratios should be large enough to generate signal 
windows [7*] that allow reliable detection of 'hits'. Equally 
important are assays with *mix and measure* protocols, 
which are easier to automate than analytical methods with 
complex separation steps such as centrifugation, washing 
and filtration. This is particularly true as the industry 
moves toward ultra- HTS assays which will screen over 
100,000 compounds per day (8]. Another advantage of 
*mix and measure* assays is that binding measurements 
are made under equilibrium conditions (without washing, 
filtration etc.), and arc therefore useful for investigating 
low affinity interactions |9). 
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Standard HTS assays arc currcnciy run in 96-wcll micro- 
ciccr places in batch formacs, since automadon and detec- 
tion instruments have been designed to be compatible 
with these plates. Combinacorial chemical synthesis can 
also be carried out in 96-well plates, making these plates a 
standard platform in nearly all HTS operations. Although 
assays in plates with 384 wells and (as well as 864- and 
1536-wclls which use the same plate dimensions) are being 
tested, assay formats based on these high density plate 
formats have yet to be widely implemented. 

Common therapeutic targets for HTS arc enzymes, cell 
surface receptors, nuclear receptors, ion channels, and 
signal transduction proteins (3**]. Compounds that interact 
with these targets are usually identified using in vitro 
biochemical assays; however, cell-based assays using en- 
gineered mammalian cell lines are now widely employed 
in HTS. This is because the ligand interaction occurs in 
the biological environment of the target, which provides 
opportunities to simultaneously monitor secondary cellular 
events such . as cytosolic Ca^* mobilization and other 
G-protein-coupled signaling. In addition, the target need 
not be purified extensively in order to be compatible 
with the in vitrp screening conditions. Cell-based assays 
also screen simultaneously for the bioavailability of test 
compounds * when intracellular targets such as nuclear 
receptors are involved. A major disadvantage, however, 
is the cost and difficulty of producing stable, engineered 
eukaryotic cell lines. Special techniques, instrumentation, 
and reagents compatible with cell-based assays have to 
be developed. Once in place, however, HTS laboratories 
are able to employ cell-based screens routinely. Detection 
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technologies available for both types of assays will be 
reviewed below. 

Detection technolc»gles 

Radiochemical methods 

Detection technologies employed in high-throughput 
screens depend on the type of biochemical pathway being 
investigated. For example, in xntro receptor binding assays 
with A'd values in the nanomolar to picomolar (nM— pM) 
range generally employ radiometric detection. The same 
is true for protein-protein interaction assays with 
values in the micromolar to nanomolar (^M— nM) range. 
Enzymatic assays, on the other hand, routinely employ 
colorimetric, fluorimetric and radiometric detection. 

Although filtration-based receptor binding assays have 
been used extensively in the past (to separate the 
bound and free radiolabeled ligand), the scintillation 
proximity assay <SPA) has become the standard assay 
in many HTS operations, mainly because it does not 
require a separation step, and can be easily automated 
(9.10.nM2M3,14,l5M6-21). SPA can also be easily 
adapted to a variety of enzyme assays (13, 14, 15*. 16] and 
protein— protein interaction assays [9,18,19]. 

One version of SPA utilizes polyvinyl toluene (PVnT) mi- 
crospheres or beads (— Spm diameter, density —1.05 g/cm-^) 
into which a scintillant has been incorporated (Fig- 
ure I; [8]). When a radiolabeled ligand is captured on the 
surface of the bead, the radioactive decay occurs in close 
proximity to the bead, and effectively transfers energy to 
the scintillant, which results in light emission. When the 




F^inciplas of scintUlaiion proximhy aaaay <SPA) t»chnology. (a) The path length of decay for the ^-particle released by the iaotope te not close 
enough to the SPA bead and the energy is dissipated in the aqueous fnedhim resulHng in littfe or no detectiorv Cb) When the radioliQand is 
bound to the SPA bead (through a specific capture motecule) the ^•particle released ia capable of excittng the scintillant contained within the 
bead and detectable light is emitted. 
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radiolabel is displaced or inhibited from binding co the 
bead, it remains free in solution and is coo distant from 
the scincillanc for efficient energy transfer. Energy from 
radioactive decay is dissipated into the solution, which 
results in no light emission from the beads. Hence the 
bound and free radiolabel can be detected without the 
physical separation required in Altration assays. 

The outer surface of the SPA bead is coated with a 
hydrophilic polyhydroxy film that reduces hydrop hob i city 
of the bead to reduce nonspecific interactions. This fiJm 
has been chemically derivatized to covalcntly couple 
generic-capture molecules. PVT beads with the following 
capture molecules are commercially available: Protein A, 
avidifv streptavidin, wheat germ agglutinin (WGA), glu- 
tathione, and sheep antimouse. donkey antirabbit and don- 
key antishecp antibodies. Ail of these capture molecules 
are used routinely as one member of a detection-pair 
system. These beads arc easily pipetted using automated 
liquid handling devices into 96-well plates and, therefore, 
are easily accommodated into HTS operations. 

The ideal isotopes for labeling ligands used in SPA 
assays arc and i^^I. This is because the 3 particles 
from have a relatively short pathlength. about 1.5 ^m, 
which easily fulfils the distance requirement for SPA. The 
Auger electrons emitted by '^^I* which travel between 
approximately 1 p.m and 17.6|im in aqueous media, also 
satisfy this distance requirement. Other commonly used 
isotopes in biology C^C, ^^S, 33P) emit particles 

with longer pathlcngths and are not suitable for SPA 
beads, since their decay is detected by the scintillant, 
even when the ligand is not bound to the surface of 
the bead (this is called the nonproximity effect). An 
SPA using 33P-labeled substrate for the cytomegaloviriis 
protease has been reported, however (IS^J. The decay 
pathlength for this isotope is — t26pjn, and it is not 
clear how the nonproximity effect was avoided in this 
case. In a similar screen using ^P-labeled peptide for 
calcineurin phosphatase activity, the nonproximity effect 
was successfully minimized by a simple centrifugation 
of assay plates (16). Other enzyme assays for copoiso- 
merase I [13] and ^-acetylgalactosaminyltransferase (14] 
utilized ^H-labeled substrates. The advantage of using 
is that the signals can be quite small, and disposal 
requires special precaution due to its long half*Ufe. Other 
recent applications of SPA beads include a coxicokinctic 
study of andscnsc oligonucleotides in plasma |17) and 
a kinetic analysis of inositol triphosphate binding to its 
receptor (20]. It appears that the use of SPA technology 
may rapidly expand beyond HTS into other areas of 
drug discovery and development such as genomics, cell, 
metabolism and toxicology. 

SPA can also be carried out in scintillating microplates 
(9,21,22*1, in which the scintillant is directly incorporated 
into the plastic, or is coated on the inner surface of 
the wells. These plates are available from two sources. 



Flashplate® is from NENTM Life Science Products 
(Boston, MA) in which the scintillant is coated on the inner 
surface of the wells. The Scinitst rip® plate is from Wallac- 
Oy (Turku, Finland) which is made by incorporating the 
scintillant into the entire plastic. With appropriate washing 
(not a 'mix and measure* technique) these plates offer the 
advantage of eliminating nonproximity effects. In addition, 
these plates arc available without licensing fees (required 
for the bead technology). One example of this is a 
protein— peptide interaction screen in which the binding of 
a 13 amino acid phosphopeptidc fragment of the epidermal 
growth factor (EGF) receptor to the GRB2-SH2 binding 
domain was investigated using the Scintistrip® plates [9]. 
The screen consisted of adding compounds to be tested 
and the I25i.|a|>eled phosphopeptidc, respectively, to 
a plate pre-coated with GRB2-SH2 binding domain, 
followed by a one hour incubation at room temperature. 
It was, however, necessary to remove all liquid from the 
wells followed by air-drying the plates before counting. 
This removal is essential to minimize nonproximity effects 
which contribute to background noise. An additional 
advantage of these plates is that they are compatible with 
other isotopes such as 3SS, 33p^ and *^2p. 

A more recent development is the Cytostar-TI*^' (Amcr- 
sham Life Sciences, Cardiff, Wales) scintillating mi- 
croplates (21) which were specially designed for cell-based 
proximity assays. Scintillant is incorporated into the base 
plate of microtiter plates and can also detect additional 
isotopes such as '^C, ^^Ca, *5S, 33p These plates have 
been successfully used to monitor ^^-labclcd thymidine 
uptake by cultured cells, and to measure ^^C^^* flux 
through ionotropic glutamatc-gated ion channels. The 
Cytostar-TTM plates were also used to detect mRNA 
transcripts in a high volume in situ hybridization [22*]. This 
is an interesting example of how HTS assay concepts are 
being applied to gene expression and target identification 
studies. 



Non-lsotople detection methods 

Coforimetry and luminescence 

Coiorimctric and luminescence detection methods have 
significant advantages for HTS laboratories, particularly 
in light of the cost, safety and disposal issues associated 
with radiochemical methods. HTS operations require 
relatively large amounts of reagents during scale-up, 
operations and follow-up phases. Radiolabeled reagents 
are expensive, and the scientists running radioactive 
screens should be . adequately trained and monitored. 
Since luminescence methods can be as sensitive as 
radioactive methods, with low detection limits, these 
techniques are being used increasingly in HTS as- 
says {23.24-,25-29,30-,3l-34 ,35» 36,37,38»,39*.40--*2,43»*. 
44-51). Glazer [24*) and Czamik ((25) and the Fluo- 
rescent Chemose nsors and Biosensors Database on the 
World Wide Web URU: http://biomednet.com/fluoro/) 
have reviewed the utility and need for fluoresce nee -based 
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techniques for btologicaJ applications, which can be easily 
extended to H'FS assays. 

Resonance energy transfer 

Resonance energy transfer (RETi Figure 2) becween a 
fluorophore and chromophore was one of the earliest 
methods developed for HTS. A pepude substrate for an 
HIV protease was synthesized with EDANS (at the amino 
terminus) as the donor fluorophore, and DABCYL (at 
the carboxyl terminus) as the acceptor chromophore C26|. 
Energy transfer from EDANS to DABCYL in the intact 
peptide resulted in quenching of EDANS fluorescence. 
On cleavage by HIV protease, the fluorescence of the 
cleaved tctrapepttdc-EDANS was restored to the free 
fluorophore level. Using this assay, inhibitors of HIV 
protease activity were identifled using a simple *mix 
and measure* assay format [26]. Although a 40-fold 
enhancement of the fluorescence signal could be obtained 
in this assay, there arc several disadvantages to the 
DABCYL— EDANS pair. Many organic and narural product 
compounds absorb around the absorption and emission 
maxiina of EDANS (Xab-340nm, Vc^.490nm>. These 
organic and natural product compounds can also quench 
the EDANS fluorescence, generating false positives. Any 
trace contamination of the peptide substrate with free 
EDANS would result in a high fluorescence background. 

Tlme-reaoived ffuorsscertce 

A new homogeneous time-resolved fluorescence (HXRF) 
technology has been described 127).' The assay utilizes 
fluorescence energy transfer between two fluorophores 
(a europium cryptatc and a 105 k Da phycobiliprotein. 



allophycocyanin) as labels. The Eu-trisbi pyridine cryptate 
(TBP-EU^*, Xex"^^7nm) has two bipyridyl groups chat 
harvest light and channel it to the caged Eu-^***. It has a 
long fluorescence, lifetime and nonradiatively transfers the 
energy to allophycocyanin when the two labels are in close 
proximity (>50% transfer cfflciency at a donor— acceptor 
distance of 9.5nm). The resulting fluorescef>ce of allophy- 
cocyanin (X^,n*665nm> retains the long lifetime of the 
donor TBP-EU^*, allowing cimc-rcsolved measurement. 
Both these labels and their spectroscopic characteristics 
are very stable in biological media. Several homogeneous 
in .vitro biochemical assays based on these two labels 
have been described (Z7]: binding of epidermal growth 
factor (EOF) to its receptor, a Jun/Fos protein— protein 
interaction and as well as a tyrosine kinase assay. Using 
this concept, the flrsc HTS'assay for a protease enzyme 
(herpes simplex virus cype-l> was recently described by 
Kolb €r ai, (28]. 

Ce/hbaaed ffuoresconce assays 

The above methodologies are not easily adapted to 
cell-based assays. An interesting fluorescence resonance 
energy transfer (FRET) procedure for sensing voltage 
across celt membranes has been described recently, 
however (29]. The technique uses membrane permeable, 
anionic, oxonols which rapidly locate on the inner or 
outer membrane surface depending on polarization state of 
the membrane. FRET occurs becween fluorescein-labeled 
WCA and the oxonols bound to the outer surface of the 
membrane at a resting negative potential. Ac a positive 
potential, the oxonols arc relocated to the inner membrane 
surface, and the FRET is greatly reduced. 



388 AnaJytlcBl tachnkiuoa 



Many fluorescence intcnsiry ' measurements, including 
FRET, can be easily conBgured on a new instrument 
spcaHcally designed for cell-based HTS assays in 96-wcl) 
plates called FLIPR [30*|. FLIPR utilizes a water-cooled 
argon ion laser (5 watt) or a xenon arc lamp and a 
scmiconfocal optical system with a charge-coupled device 
(CCD) camera to illuminate and image the entire plate. 
Xhe spatial resolution of the optics is -ZOOpm at the cell 
plane. The plate chamber temperature can be controlled 
precisely, and a 96-well pipcttor head is integrated into the 
instrument. Xhese features allow accurate measurements 
of cellular biochemistry in confluent layers of cells 
at the bottom of plates. FLIPR software can rapidly 
quantify transient fluorescence signals in intact cells that 
are growing attached to the bottom of the well. HTS 
assays involving intracellular calcium, pH and membrane 
potential measurements have been designed using this 
instrument (3l|. 

Fluorescence polarizBtion 

Another technique that has gained popularity recently is 
fluorescence polarization or anisotiopy [32— 34,35*,36,37,38*]. 
When fluorescently labeled molecules in solution are 
illuminated with plane -polarized light, the emitted fluo- 
rescence will be in the same plane provided the molecules 
remain stationary. Since all molecules tumble as a result of 
collisional motion, depolarization of fluorescence emission 
occurs. This polarization phenomenon is proportional to 
the rotational relaxation dme i\L} of the molecule, which 
is deflned by the expression 3t)V/RT. At constant viscosity 
and temperature (T) of the solution, polarization is 
directly proportional to the molecular volume (V) (R is 
the universal gas constant). Hence changes in molecular 
volume or molecular weight due to binding interactions 
can be detected as a change in polarization. For example, 
the binding of a fluorescently labeled ligand to its 
receptor will result in significant changes in measured 
fluorescence polarization values for the ligand. Once again, 
the measurements can be made in a *mix and measure* 
mode without physical separation of the bound and free 
li^nds. The polarization measurements are relatively 
insensitive to fluctuations in fluorescence intensity when 
working in solutions with moderate optical intensity. 

A fluorescence polarization assay (FPA) for the cy- 
* tomcgalovirus protease using a peptide substrate labeled 
with biotin and S*(4,6-dichlorotriazinyl>aminofluorescein 
was reported recently (35*). This assay is similar to the 
SPA assay reported earlier (15*], except that the capture 
reagent is avidin, and it is added td the enzyme substrate 
mixture. High polarization values were observed- when 
the enzyme was inhibited and the uncleaved substrate 
became complexed with avidin. Another HTS utilizing 
an FPA involved the interaction of fluorescein-labelcd 
peptides containing phosphorylated tyrosine with Src-SH2 
domains (38*|. In both cases, a 96-well plate reader 
(FPM-2, Jolley Consulting and Research; Round Lake 
Illinois, USA) was used for the HTS. Signal from the 



enrire plate is read in about three minutes, making 5&*100 
plates/day assays quite feasible in HT^ laboratories. 

Fluoreacenee correfation spectroscopy 

Fluorescence correlation spectroscopy (FCS) has been 
recently described for HTS applications (39*.40.41J. FCS 
measures time-dependent and spontaneous fluctuations in 
fluorescence intensities in very small volumes (nanoliters). 
These fluctuations usually result from Brownian motion 
associated with chemical reacrions, diffusion or the flow of 
fluorescently labeled molecules. The average fluctuation 
is proportional to the square root of N, where N is 
the average number of molecules in the volume. Since 
Brownian diffusion is directly affected by molecular 
interactions, FCS is an excellent tool to measure binding 
interactions (23). Using powerful lasers and autocorrelation 
techniques, sensitive measurements (at concentrations of 
"lO-^^M) can be made both in solution and an cellular 
compartments. Access to this technology is limited since 
this instrumentation for HTS is available only through 
collaborative agreements on a semiexclusive basis (39*). 

Cell-based assay systems for HTS have been thoroughly 
reviewed, with guidelines for selecring appropriate screen- 
ing systems t43**). Assay systems using mammalian and 
insect cells, as well as yeast and bacterial cells have 
been described. The most common method for detecting 
ligand interaction with drug targets expressed in cells 
is to employ a reporter gene (3*»,43**,45,46,49,50J. This 
involves splicing the transcriptional control elements of a 
target gene (a gene that controls the biological expression 
and function of a disease target) with a coding sequence 
of a reporter gene into a vector. This vector is then 
transfccted into a suitable cell line in order to construct 
a detection system that responds to modulation of the 
target. Common examples of reporter genes are enzymes 
such as chloramphenicol acetyl transferase (AT), alkaline 
phosphatase (AP), firefly and bacterial luciferases, and 
p-galactosidase. These enzymes can be detected at very 
low levels using colorimetric, chemiluminescent or biolu- 
minescent products of specific substrates. The chemistry 
of chemiluminescent and bioluminescent reactions have 
been reviewed in detail [46,47]. 

A new reporter system using the p-lactamase enzyme with 
a membrane permeable fluorogenic substrate has been 
cited for cell-based assays (3**). The advantage is that 
the enzyme is monomeric and has no endogenous activity 
in mammalian cells. Since fluorescent substrates arc not 
yet commercially available, this system is yet to be used 
widely in HTS applications. 

Future devolopments and conclusions 

Several new trends can be observed in the recent HTS 
literature ((52-56,57— ,5»-69). The use of 384-wcll plates 
in HTS is being investigated [52], which would increase 
throughput and reduce reagent cost. Statistical experimen- 
tal design tools are being explored to improve the ro- 



Hlett*throijghput seroentns Sittampalam, Kahl and lanzen 389 



business of assays [53]. New recombinant microorganisms 
are being studied to screen for non-ancibiotic compounds 
[54]. A sensitive col ori metric assay for in vifro molecular 
recognition using polymeric artificial membranes has been 
described (56,57**«58]. These membranes, which contain 
a ligand, can be polymerized into liposomes. These 
liposomes change their chromatic properties on binding 
to a solubilized target such as a receptor. Developments 
in scanning probe microscopy for screening and drug 
development (I59»60] are quite exciung because the 
molecular interaction could be detected without labeling 
the target or the ligand. 

New analytical devices are also being developed. A 
detection device based on an amperometric sensor chip 
(62] and an amperometric electrode probe. (63) has been 
described. The microariay technology chat has been 
developed for analyzing gene expression (65), and other 
analytical methods used in characterizing combinatorial 
libraries (66-69)« could be adapted for medium-throughput 
screening applications. 

The science of HTS is undergoing explosive growth due 
to rapid developments in assay technology. Major trends 
include the development of nonisocopic detection systems 
and the use of cell-based assays. Miniaturization of assay 
technologies coupled with aucomarion of high<throughput 
combinatorial synthesis is helping to set the stage for 
screening SO-1 00,000 samples/day in an ultra-HTS mode. 
Bioinformatics systems to collect, analyze, manipulate 
and store the massive amount of data are also being 
rapidly developed. When these capabilities are realized, 
the multitude of targets derived from the human genome 
effort can be screened, using large numbers of structurally 
diverse libraries to generate selective and potent lead 
compounds. It is also anricipated that the technologies 
developed will greatly contribute to efficient design 
of secondary and tertiary assays used to determine 
structure-activity relationships. The net effect would be 
the ready availability of multiple, high quality leads to 
develop novel therapies for the treatment and prcvcndon 
of disease. 
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ABSTRACT Tryptases, the predominant serine protein- 
ases of human mast cells, have recently been implicated as 
mediators in the pathogenesis of allergic and inflammatory 
conditions, most notably asthma. Their distinguishing fea- 
tures, their activity as a heparin-stabilized tetramer and 
resistance to most proteinaceous inhibitors, are perfectly 
explained by the 3-A crystal structure of human 0II-tryptase 
in complex with 4-amidinophcnylpyruvic acid. The tetramer 
consists of four quasiequivalent monomers arranged in a flat 
frame-like structure. The active centers are directed toward a 
central pore whose narrow openings of approximately 40 A x 
15 A govern the interaction with macromolecular substrates 
and inhibitors. The tryptase monomer exhibits the overall fold 
of trypsin-like serine proteinases but differs considerably in 
the conformation of six surface loops arranged around the 
active site. These loops border and shape the active site cleft 
to a large extent and form all contacts with neighboring 
monomers via two distinct interfaces. The smaller of these 
interfaces, which is exclusively hydrophobic, can be stabilized 
by the binding of heparin chains to elongated patches of 
positively charged residues on adjacent monomers or, alter- 
natively, by high salt concentrations in vitro. On tetramer 
dissociation, the monomers are likely to undergo transforma- 
tion into a zymogen-like conformation that is favored and 
stabilized by intramonomer interactions. The structure thus 
provides an improved understanding of the unique properties 
of the biologically active tryptase tetramer in solution and will 
be an incentive for the rational design of mono- and multi- 
functional tryptase inhibitors. 



Human mast ceil tryptases (EC 3.4.21.59) comprise a family of 
trypsin-like serine proteinases closely related in sequence that 
are derived from >3 nonallelic genes (1, 2). Tryptases (at least 
isoenzymes al, /3I, j3II, and /3III) are highly and selectively 
expressed in mast cells and to a lesser extent in basophils (3, 
4). Only j8-tryptases, however, appear to be activated intra- 
cellularly and stored in secretory granules (5, 6), accumulating 
to much larger amounts than any other of the granule- 
associated serine proteinases of leukocytes and lymphocytes. 
On mast cell activation, /3-tryptases are secreted bound to 
heparin in diverse allergic and inflammatory coiiditions rang- 
ing from asthma and rhinitis to psoriasis and multiple sclerosis. 
Various studies performed in animals and humans have pro- 
vided considerable evidence that tryptases are directly in- 
volved in the pathogenesis of asthma (7-9), a hypothesis also 
supported by apparent genetic links of tryptases to airway 
reactivity (10, 11). 



PNAS is available online at www.pnas.org. 



Several unique properties distinguish tryptases from other 
trypsin-like proteinases (reviewed in refs. 12 and 13). Most 
notably, tryptases are enzymatically active in the form of a 
noncovalently linked tetramer. The tetramer is stabilized by 
association with negatively charged aminoglycans such as 
heparin or high ionic strength conditions in vitro. On dissoci- 
ation, reversible only under certain conditions, the monomers 
lose activity, apparently because of transition into a zymogen- 
like state (14, 15). This mechanism is thought to govern 
tryptase activity in vivo. With the exception of the ^'atypical" 
Kazal-type inhibitor leech-derived tryptase inhibitor (LDTI) 
(16, 17), human tryptases are resistant to inhibition by pro- 
teinaceous inhibitors- In accordance with their trypsin-like 
activity, tryptases efficiently hydrolyze a number of peptide 
substrates including the neuropeptides "vasoactive intestinal 
peptide" and "peptide histidine methionine" (18). Few macro- 
molecular substrates are cleaved, however, leading to the 
activation of prostromelysin, prourokinase, and the protein- 
ase-activated receptor-2 (19-21) and the inactivation of fi- 
bronectin and of the procoagulant functions of high molecular- 
mass kininogen and fibrinogen (22-24). 

These distinguishing features are well explained by the 
crystal structure of the human lung )3II-tryptase tetramer, 
whose overall architecture has been summarized recently (25), 
Here, we describe the identification of the tetramer within the 
crystal packing, the detailed structure of the monomers, and 
their interactions in the tetramer. In addition, structural 
features likely to favor a zymogen-like conformation of iso- 
lated monomers and models of the interaction with stabilizing 
heparin proteoglycans and inhibitors are presented. 

Identification of the Relevant Tryptase Tetramer. In theA;-^ 
plane of the tryptase crystals, the tryptase monomers are 
arranged in flat rectangular tetrameric aggregates that form 
extended protein layers (Fig. \a). Within these layers, each 
tetramer is rotated about the crystallographies- and 6-axes by 
^7°, in agreement with the self-rotation function. The tetra- 
mers appear well separated from their neighbors in one 
direction (x-direction in Fig. \a) but are in somewhat closer 
contact in the perpendicular direction (y in Fig. la). In the 
2-direction, the tetramers are stacked along the crystallo- 
graphic 4i screw axis. Because of the T tilt of each tetramer 
from the x-y plane, their projections (Fig. \b) alternate be- 
tween leaning to the left, being horizontal, and leaning to the 
right, respectively, giving rise to a 7** precession motion of the 



Abbreviations: APPA, 4-amidinophenylpyruvic acid; LDTI, leech- 
derived tryptase inhibitor. 

Data deposition: The atomic coordinates have been deposited in the 
Protein Data Bank, www.rcsb.org (PDB ID code lAOL). 
■♦■Towhom reprint requests should be addressed. E-mail: sommerhoff@ 
clinbio.med.uni-muenchen.de. 
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Fig. 1 . Packing of the human /3II tryptase crystal, {a) View along the z-axis showing one layer of tryptase molecules in the x-y plane. The tryptase 
monomers are grouped into tetrameric aggregates that form extended sheets. Each of these tryptase tetramers is clearly delimited from its neighbors 
in both directions. A "reference" tetramer is shown in red for simplicity, (p) View across the z-axis. In the z direction, layers of tetramers are stacked 
on each other along the 4i screw axis. The local 2-fold symmetry axis is tilted from the z direction by causing increased crystal-stabilizing 
contacts between layers stacked in thez-direction. One unit cell (82.9 x 82.9 X 172.9A), occupied by four tryptase tetramers, is indicated by a white 
bordered box. 



local (2-fold; see below) rotation axis along the crystallo- 
graphic 4i screw axis. The largely complementary interaction 
surfaces between the monomers of the tetramer are typical for 
intersubunit contacts, whereas neighboring tetramers interact 
with one another via much more usual crystal contacts. Thus, 
within a tetramer, monomer A (Fig. 2) interacts with mono- 
mers B and D via interfaces of sizes 540 and 1,075 
respectively (solvent inaccessible surface probed by using a 
sphere of 1,4-A radius; Collaborative Computational Project 
No. 4 suite). In contrast, the four monomers of one given 
tetramer interact with monomers from neighboring tetramers 
via interfaces of less than 280 A^ (in ihcx-y plane) and 265 A^ 
(along the z-axis), respectively. The contacts between tetra- 
mers include a number of hydrogen bonds and six unique salt 
bridges and thus are qualitatively similar to those usually 
observed in typical crystal contacts. 

These packing considerations suggest that the tetramer 
emphasized in Fig. 1 represents the enzymatically active 
tetramer of human /3-tryptase. This tetramer selection is 
supported by the finding that the six loops that deviate most 
from the structures of other trypsin-like proteinases are all 
involved in forming monomer-monomer contacts within a 
tetramer. More important, this unique tetramer perfectly 
explains the distinguishing properties of tryptase in solution, 
e.g., the resistance to proteinaceous inhibitors other than 
LDTI, the unusual substrate specificity, and the stabilization 
by the binding of heparin-like glycosaminoglycans (see below). 

Overall Tetramer Structure. In the tryptase tetramer, 
monomers (arbitrarily assigned A, B, C, and D in Fig. 2) are 
positioned at the corners of a flat rectangular frame leaving a 
continuous central pore. The tetramer displays almost perfect 
222 symmetry that, however, is not exact because of the 
crystallographically asymmetric environment and an imperfect 



internal packing (see below). The horizontal and the vertical 
2-fold axes, which cross each other in the center of the 
tetramer, relate monomers A to B and C to D, or A to D and 
B to C, respectively. The third 2-fold symmetry axis relating 
monomers A to C and B to D is arranged virtually perpen- 
dicular to the other 2-foId axes and runs almost through their 
point of intersection in the central pore. 

The active centers of the four monomers are directed toward 
the central pore (Fig. 2). This pore exhibits a rectangular cross 
section and is twisted by «=30** about the tetramer axis. It 
possesses two narrow openings of dimension 40 A X 15 A, and 
widens in its central part to a cross section of 50 A X 25 A, just 
large enough for elongated peptides of the diameter of an 
a-helix to thread though the exits and to interact with the 
active sites. Both pore entrances are partially obscured by the 
147-loops (see below), which project from each of the mono- 
mers but on alternative entrance sides, so that only two 
diagonally arranged active centers can be viewed directly (Fig. 
2). With 33 basic (including 12 His residues) and 24 acidic 
residues per monomer, human tryptase exhibits an average 
percentage of charged residues comparable to related serine 
proteinases, but is only slightly positively charged at neutral 
pH. Tliese charges are not evenly distributed along the mo- 
lecular surface, however. Rather, negatively charged residues 
cluster preferentially on the inner pore-facing surface, con- 
ferring the pore with a quite negative electrostatic potential, 
and along the peripheral A-D (and B-C) edges. In contrast, 
the A-B (and C-D) peripheries and one front side of the 
monomer surface are positively charged and probably are 
involved in heparin binding (see below and Fig. 6). 

Monomer Structure. The tryptase monomer exhibits the 
typical 0-strand-dominated fold seen in other trypsin-like 
serine proteinases. The core is made by two six-stranded 
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Fig. 2. Overall structure of the tryptase tetramer. The four 
monomers A, B, C, and D (clockwise) are shown as blue, red, green, 
and yellow ribbons, each surrounded by a semitransparent surface. The 
inhibitor molecules APPA are given as orange CPK models, each 
binding into one of the four SI specificity pockets, 

/3-barrels that are packed together and further clamped by 
three transdomain segments (Fig. 3). This core structure is 
covered by a number of polypeptide loops, a short a-helical 
turn (AIa-55-Gly-66, not shown in Fig. 3g), and two regular 
a-helices, the so-called "intermediate helix" (Glu-164-Leu- 
173A) and the C-terminal helix (Arg-230-Val-242). The cata- 
lytic residues Ser-1 95, His-57, and Asp-102 (chymotrypsinogen 
numbering) are located in the junction between both barrels. 
The active-site cleft runs perpendicular to this barrel junction. 
In the "standard orientation" shown in Fig. 3, this cleft runs 
approximately horizontally across the molecular surface facing 
the viewer and is ready to accommodate and bind extended 
peptide substrates extending from left to right. One hundred 
sixty-two and 168 residues of the tryptase monomer are 
topologically equivalent to the archetypal proteinases chymo- 
trypsin (26) and trypsin (27), respectively, with an rms devi- 
ation of their a-carbon atoms of 0.65 A for both comparisons. 
The numbering of the tryptase residues given in this article is 
predominantly based on the equivalence with chymotryp- 
sinogen (28) and at only a few trypsin-characteristic sites on 
that with trypsin (27). 

In detail, however, the topology of the tryptase monomers 
deviates significantly from these reference proteinases (Fig. 
3b), probably more than any other trypsin-like serine protein- 
ase. In particular, sbc surface loops that border and shape the 
active-site cleft are unique (Fig. 3a). These loops comprise the 
147-loop (including the 152-"spur"), the 70- to 80-loop, the 
37-loop, the 60-Ioop, the 97-loop, and the 173-loop (Fig, 3a). 
The 147-loop, which together with Gln-192 forms the rather 
acidic southern wall of the active-site cleft, is shortened by one 
residue in its initial part, but contains a two-residue insertion 
(Pro-152-Pro-152A-cisPro-152B-Phe.l53-Pro-154) in its 
proline-rich and hydrophobic 152-spur. The neighboring 70- to 
80-loop to the east, which in the calcium-binding serine 
proteinases winds around a stabilizing calcium ion (27), is 
three residues shorter and more compact in tryptase. It is 
probably not designed for calcium binding, in spite of topo- 
logically similar liganding groups; Glu-70 and Asp-80, involved 
in a partially buried salt bridge cluster with Arg-34, are 



T0-«0 loop 




Fig. 3. The tryptase monomer in standard orientation, i.e., as seen 
approximately from the middle of the central pore of the tetramer 
toward the active site of monomer A (represented by Ser-1 95, His-57, 
and Asp-102). {a) Ribbon representation of a tryptase monomer. The 
amidino group of the APPA molecule interacts with Asp-189 in the SI 
pocket, Ser-195 O-7 is bound covalently to the APPA carbonyl group 
forming a hemiketal. The six unique surface loops of tryptase that 
surround the active site and are engaged in intermonomer contacts are 
shown in special colors, namely (anticlockwise) the 147-loop (light 
blue), the 70- to 80-loop (yellow), the 37-loop (orange), the 60-loop 
(magenta), the 97-ioop (green), and the 173-flap (red). All other 
tryptase segments are given in dark blue. The side chains of the 
catalytic triad residues as well as Asp-143, Asp-145, and Asp-147 in the 
acidic 147-loop are shown as a ball-and-stick model, (b) Overlay of the 
structures of the tryptase monomer and bovine trypsin, both given as 
ropes. The color-coding of tryptase is as in a, whereas trypsin is shown 
in gray. The most relevant deviations from the trypsin backbone 
appear in the colored loop regions of tryptase. 

oppositely arranged to the two calcium-binding Glu residues in 
trypsin. The 37-loop, above the 70- to 80-loop, possesses two 
additional residues (Pro-37A and Tyr-37B), which bulge away 
from the loop axis. The adjacent 60-loop, with five inserted 
residues, turns away from the cleft abruptly to the north, where 
it kinks at cisPro-60A to approach the general main chain 
course of other serine proteinases. At position 69, a buried Arg 
replaces the Gly residue that is strictly conserved in most other 
homologous proteinases, allowing for a special conformation. 
Although the 97-loop, at the northern rim of the cleft, contains 
the same number of residues as other serine proteinases, it 
differs considerably in conformation. The N-terminal part is 
shortened by two residues between positions 96 and 97, thus 
placing AIa-97 in the position normally occupied by residue 99, 
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whereas its C-terminal part makes an unusual extra helical turn 
before arriving at Asp-102. By far the largest insertion, with 
nine residues, occurs in the 173-loop. After the unusually long 
three-turn intermediate helix, the 10 residues from His-173 to 
Val-1731 form an exposed flap centered around the imidazole 
side chain of His-173. 

With 245 amino acid residues, the tryptase monomer pos- 
sesses 15 and 22 residues more than the B-chains of chymo- 
trypsin and trypsin, respectively. Compared with chymotryp- 
sinogen, most of these extra residues present in all tryptases 
known so far are inserted in the 37-loop (two residues), the 
60-ioop ( + 5), the 1 47-loop ( + 1 ), the 1 73-loop ( + 9), at position 
221 A (+1) and at the C terminus ( + 1), whereas the 70- to 
80-loop (-3) and the 214- to 220-loop (- 1, as in all trypsin-like 
serine proteinases) are shorter. On the reverse side, the largely 
hydrophobic cluster of four Trp residues (Trp-27, -29, -207, and 
-137) is noteworthy. Only the indole moieties of the latter two 
Trp are significantly exposed to the surface. At the C terminus, 
only the main chain atoms of the two penultimate residues 
Lys-244 and Lys-245 are well defined by electron density, while 
the C-terminal Pro-246 could not be located. The side chain of 
the single N-linked sugar attachment site in human )3II- 
tryptase, Asn-204, extends away from the molecular surface 
opposite to the active site. Some residual electron density 
exists distal to its carboxamide group, which is not large 
enough to account for a covalently linked sugar residue. 

As found in almost all trypsin-like serine proteinases [ex- 
cept, e.g., single-chain tissue type plasminogen activator (29)], 
the N-termina! lle-16-Val-17 segment is inserted in the Ile-16 
pocket, forming a solvent inaccessible salt bridge between its 
free Ile-16 a-amino group and the carboxylate group of 
Asp- 194. The formation of this salt bridge after activation 
cleavage creates a functional substrate recognition site by 
reorienting the Asp- 194 side chain from an external position in 
the zymogen, where it might hydrogen bond to a surface 
located His-40— Ser-32 pair forming the so-called "zymogen 
triad," to an internal position in the active molecule (30, 31). 
This reorientation restructures the surrounding "activation 
domain," which in trypsin(ogen) mainly includes the linings of 
the Ile-16 pocket and the SI specificity pocket (i.e., segments 
Ile-16-Gly-19, Tyr- 184 -Asp- 194, Gly-216-Asn-223, and Gly- 
142-Tyr-151), and the "oxyanion hole" formed by the amide 
groups of GIy-193 and Ser-195 (28, 32, 33). The single-chain 
zymogen and the activated monomer are adequately described 
by a two-state model, in which an inactive conformation is in 
equilibrium with an active form possessing a structured acti- 
vation domain (31). The partition between both forms depends 
on environmental conditions such as the endogenous free 
Ile-16-Val-17 N-terminal segment (34), free Tle-Val dipeptide 
(31), ligands in the substrate binding site (30, 36), or other 
effectors such as fibrin with respect to tissue plasminogen 
activator or tissue factor in the case of Factor Vila (29, 37). 
This conformational partition can be influenced by internal 
molecular groups that stabilize or destabilize one or the other 
state. Tryptase possesses the zymogen triad residues His-40 
and Ser-32, which would stabilize the zymogen state. In 
addition, the acidic residues Asp-143, Asp-145, and Asp-147 
arranged around the Ile-16 cleft could form a negatively 
charged anchoring site that could compete with the ne-16 
pocket for the Ile-16 a-amino group, thus destabilizing the 
structured active state of the tryptase monomer. Furthermore, 
some of the loops in contact with the activation domain of 
tryptase, such as the long 173-loop or the 70- to 80-loop, which 
has been shown to be strongly correlated with the equilibrium 
state in bovine elastase "subunit III" (38), could influence the 
structured state. The conformation of the tryptase 173-loop, 
probably held in place in the tetramer by contacts with 
monomer D, certainly has an effect on the stability of the 
integrated monomer. Interestingly, tissue factor, thought to 
support insertion of the N-terminal Ile-16 a-amino terminus of 



activated Factor Vila B-chain on complex formation (37), 
likewise binds to the 173-loop at the intermediate helix 
flank (39). 

Interfaces. All monomer-monomer contacts within the 
tetramer are realized via six loops arranged around the active 
center. These loops, emphasized by special colors in Figs. 3-5, 
differ fundamentally in their conformation and partly in size 
from those of other trypsin-like serine proteinases. Monomers 
A and B interact with one another through the 147-loop, the 
70- to 80-loop, and the 37-Ioop (Fig. Ad), Each 152-spur slots 
into a cleft formed by the 37- and the 70- to 80-loop of its own 
monomer and the 152-spur of the opposing neighbor. At the 
center of the interface, the side chains of Phe-153 and Tyr-75 
from each subunit form an approximate tetrahedron (Fig. 5a). 
The side chain of Tyr-75 from monomer B (D) would clash 
with the equivalent A (C) side chain if they were arranged in 
a symmetrical manner. Instead, the phenolic group of Tyr-75 
of monomer A turns in the opposite direction, breaking the 
2-fold symmetry (see the partial electron density in Fig. Sa). 
This A-B (C-D) interface is exclusively hydrophobic, with a 
remarkable number of Tyr and Pro side chains involved, and 
lacks any intermonomer hydrogen bonds. Toward the pore, the 
side chains of the two Arg-150 residues oppose one another. 
The charges of their guanidyl groups presumably make unfa- 
vorable energy contributions to the A-B interaction. 

Monomer A interacts with monomer D through the entire 
northern rim consisting of the 173-flap, the 97-loop, and the 
60-loop (Figs. Aa and 5^), again via equivalent loops. Both 
97-loops rest with their 95-99 segments on one another (Fig. 
Aa), with both Ile-99 side chains in direct contact. Further 
toward both peripheries, segment Pro-60A-Asp-60B and the 
opposing segment Gly-173B-Tyr-173D run antiparallel to one 
another, forming two-rung antiparallel ladders between Gly- 
173B-Tyr-173D and Pro-60A-Val-60C (Fig. 5b), Each Tyr-95 
aromatic side chain nestles into the bend of the opposing 
173-flap, and each Tyr-173D phenolic side chain slots into a 
hydrophobic cleft made by the 60-loop and the 97-loop of the 
opposing monomer. In addition, both monomers are cross- 
connected by salt bridges between Asp-60B and Arg-224 and 




Fig. 4. Loop arrangements in the tetramer. The six special loops 
engaged in monomer-monomer interactions are shown in the color 
coding introduced in Fig. 3. (a) The D-A dimer as seen from outside 
of the tetramer along the local 2-fold axis, (p) The monomer viewed 
in standard orientation, (c) Front view of the tetramer, (rf) The A-B 
dimer seen from outside of the tetramer along the local 2-fold axis. 
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Fig. 5. Stick representation of the contact interfaces between monomers, {a) The AB-interface seen from inside the tetramer along the local 
2-fold axis, shown together with the final IFq—Fc electron density map for both Tyr-75 side chains contoured at I <r level. The monomers and loops 
are given in the color coding introduced in Figs. 3 and 4. (Jb) The AD-interface (half side) observed approximately perpendicular to the local 2-fold 
axis, shown together with all intermonomer hydrogen bonds and salt bridges (green dots). Segments of monomers A and D are given in blue and 
yellow, respectively. 



by four hydrogen bonds involving both main and side chains 
(Fig. 5h). Thus, the A-D (and the corresponding B-C) inter- 
face comprises a number of polar/charged interactions in 
addition to several hydrophobic contacts. 

The A-B homodimer carries a number of positively charged 
residues at the periphery, which cluster and form an obliquely 
oriented two-lobed patch of positive charges that extends 
toward one of the front sides of each monomer, giving rise to 
the blue-colored electrostatic potential surfaces in Fig. 6. With 
an overall length of almost 100 A, this patch would allow tight 
electrostatic binding of an extended heparin chain of *^20 
sugars running obliquely along the A-B edge as shown in Fig. 
6. The length of such heparin chains is in good agreement with 
the experimentally observed stabilization of the tetramer by 
heparin fractions of molecular mass 5,500 Da and above (40). 
On the peripheral surface of the A-D (and the corresponding 
B-C) homodimer, in contrast, positive charges are counter- 
balanced by adjacent negative ones. 

Interaction with Substrates and Inhibitors. The immediate 
vicinity of the tryptase active site is quite similar in structure 
to that of trypsin. The specificity 81 pocket, which opens to the 
west of the reactive Ser-195 (Fig. 3a), is virtually identical to 



that of trypsin and well suited to accommodate Pl-Lys and Arg 
side chains. The 4-amidinophenylpyruvic acid (APPA) mole- 
cule inserts into this pocket in the same manner as in the 
complex with trypsin (41). Thus, its amidino group hydrogen 
is bonded to both Asp- 189 carboxylate oxygens, Gly-219 O and 
Ser-190 O7, and its phenyl ring is sandwiched between peptide 
planes 215-216 and 190-192. Ser-195 O7 bonds to the carbonyl 
group of the letrahedral pyruvate part of APPA (Fig. 3a), and 
hydrogen bonds to His-57 Ne. As indicated by the relatively low 
equilibrium dissociation constant of the APPA-tryptase com- 
plex [Ki 0.71 /ulM; (42)], APPA fits well to the tryptase active 
site. Toward the south of the active site of tryptase, the side 
chains of Asp-143, Asp-145, and Asp-147 protrude from the 
relatively flat and hydrophobic southern embankment (Fig. 
3a). The resulting negative charge cluster provides a second 
anchoring point for dibasic synthetic tryptase inhibitors such as 
bis-benzamidines (17, 42, 43), allowing favorable interactions 
with a distal basic group such as in pentamidine. The structural 
basis of the unexpected high affinity of bifunctional inhibitors 
containing suitably arranged adjacent imidazole moieties such 
as present in the inhibitor BABIM and closely related ana- 
logues (43, 44) has recently been revealed: two nitrogen atoms 
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Ftg. 6- Model of the binding of a 20-mer heparin-likc glycosamino- 
glycan chain along the A-B edge of the tryptase-letramer. The 
solid-surface representation of tryptase indicates positive (blue) and 
negative (red) electrostatic potential contoured from —4 kT/e to 4 
kT/e. The heparin chain (green/ye!low/red stick model) is long 
enough to bind to clusters of positively charged residues on both sides 
of the monomer-monomer interface, thereby bridging and stabilizing 
the interface which is exclusively hydrophobic in nature (see Fig. 5a). 

of the two methylene-connected benzimidazoles coordinate a 
zinc ion that also binds to the active-site located Ser-195 Oy 
and His-57 Ne (44). The zinc-mediated binding enhancement 
of BABIM-like inhibitors is particularly large but not restricted 
to tryptase. 

Toward the east, the substrate-binding site of tryptase is not 
only bounded by the side chains of Tyr-37B and Tyr-74 of 
monomer A, bul also by the Phe-153 benzyl group and the 
152-spur of the neighboring monomer B. Thus, binding of 
extended substrate chains is limited to about P5' (Fig. 7). 




Fig. 7. View from the LDTI inhibitor (represented only by its 
reactive site loop P7 to P3') toward the active-site cleft. The PI Lys 
residue is buried. 



Toward the north, the 97-loop of monomer A borders the 
substrate binding region in a manner different from most other 
serine proteinases, and together with the side chains of Phe-94, 
Ala-97, and Gln-98 of monomer D forms a projecting "can- 
opy." The S2 subsite underneath is open and larger than that 
of trypsin. The S3/S4 subsite above the Trp-215 indole moiety 
is fully blocked by the side chain of Gln-98 and the phenolic 
group of Tyr-95 provided by monomer D. Toward the west, 
however, the substrate-binding site is bordered exclusively by 
segments of the D-monomer, in particular the His-57 imida- 
zole ring and segment 57-60. Thus, the active centers of 
monomers A and D (B and C) are spatially close (distance ^23 
A for the A-D pair) to each other in the tryptase tetramer, 
rendering the tryptase tetramer suitable for the specific bind- 
ing of bifunctional inhibitors with relatively short spacers. 

The central pore of tryptase restricts the size of accessible 
substrates and inhibitors considerably. For larger proteins such 
as fibronectin and the zymogens of stromelysin-1 and uro- 
kinase-type plasminogen activator, the cleavage sites must be 
extended into the active sites. Docking experiments with 
C-terminaliy truncated prostromelysin-1 (45) and with single- 
chain tissue plasminogen activator (29) as a model for 
prourokinase show that the activation cleavage loops of these 
proproteinases must be extracted from their crystal structures 
to allow binding in the tryptase active center. More flexible 
peptides, in contrast, could easily thread through the pore of 
the tetramer to be processed or destroyed. Flexible polypep- 
tide chains with two distant basic residues, as in "vasoactive 
intestinal peptide" (18), might even dock to adjacent active 
sites simultaneously to produce fragments of distinct length. 

The active centers of the tryptase monomers are also largely 
inaccessible for macromolecular inhibitors. The only exception 
known is LDTI, an "atypical" Kazal-type inhibitor that is 
smaller than the classical members of this family (16). LDTI 
has been shown to bind to trypsin through its reactive-site loop 
(residues P4 to P4') in a canonical manner (17, 46). In the 
model of the complex with tryptase monomer A, the four 
N-terminal residues preceding this binding segment could 
bend toward the south (with respect to Figs. 3 and 7), leading 
to the juxtaposition of the basic Lys-ll-Lys-I2 amino terminus 
(with the suffix I identifying inhibitor residues) with the 
carboxylate groups of Asp-143 and Asp- 147 of monomer A. 
Alternatively, the two Lys residues could interact with Asp- 
60B of molecule D. The involvement of such electrostatic 
interactions is supported by the deleterious effect of deletions 
and substitutions of these basic residues on the affinity of 
LDTI toward tryptase but not trypsin (17). The LDTI reactive- 
site loop, running from Cys-114 (P5) to Pro-122 (P4'; ovomu- 
coid numbering), is relatively small compared with classical 
Kazal-type inhibitors, allowing good overall fit to the restricted 
substrate binding groove (Figs. 7 and Sa), Furthermore, its 
central helix is one turn shorter, so that it just fits into the 
central pore of the tetramer on canonical binding to the active 
site of monomer A with only a few narrow contacts of its 
molecular antipole, opposite to its reactive-site loop, with the 
147-loop of monomer D. Docking of a second LDTI molecule 
is possible at the opposite active centers of either monomer B 
or monomer C (Fig. 8a). A slight collision between Cys-156 and 
Gly-128 of two bound LDTI molecules could be relieved by 
minor torsion in the proteinase-inhibitor interfaces, as ob- 
served for other canonically binding inhibitors such as eglin c 
(46). Any such torsion in the LDTI molecule bound to 
monomer A would impose an opposing torsion in the LDTI 
molecule bound to monomer B, facilitating such a relaxation. 
The simultaneous binding of two LDTI molecules to the 
tetramer is in good agreement with experimental results 
showing «50% inhibition of the cleavage activity toward small 
chromogenic substrates by nanomolar LDTI concentrations 
(16). Modeling experiments with more elongated classical 
Kazal-type inhibitors or with the prototypical bovine pancre- 



10990 Colloquium Paper: Sommerh 




Proc. Ni 



ad, Sci. USA 96 (1999) 




Fig. 8. Models of the interaction of the human tryptase tetramer with proleinaceous inhibitors. The tryptase tetramers are shown as green 
ribbons. An inhibitor molecule (blue) is modeled into the active site of monomer A by superposition of the proteinase moiety of known 
proteinase-inhibitor complexes to a tryptase monomer. For LDTI and BPTI the target proteinase was trypsin (17, 49), for MPI chymotrypsin (47). 
The active sites of the other tryptase monomers are occupied by APPA molecules (orange). Parts of the inhibitors clashing with the structure of 
tryptase (i.e., a distance smaller than 1 .5 A between the Ca-atoms of the respective molecules) are highlighted in red. (a) In addition to one molecule 
of the **atypical" Kazal-type inhibitor LDTI bound to the tryptase monomer A a second molecule (shown in pink and yellow) can bind to the active 
site of either monomer B or C. (b) Bovine pancreatic trypsin inhibitor (aprotinin), (c) Human mucous proteinase inhibitor bound to tryptase with 
its inhibitorily active second domain. 



atic trypsin inhibitor indicate strong collisions of their distal 
pole segments with the neighboring monomers D and B, in 
particular with the 147-loops, explaining the observed inac- 
tivity of these inhibitors toward tryptase (Fig. 8^>). The central 
portion of the two-domain mucous proteinase inhibitor 
(MPI = SLPI = HUST-T) would clash with the A-D interface 
region of the tryptase tetramer if bound to the active site of 
monomer A (Fig. 8c) via its inhibitorily active second domain 
(47). Similarly, elafin (= SKALP), an inhibitor corresponding 
to the MPI second domain (48), should not be able to inhibit 
tryptase. The much larger plasma proteinase inhibitors are 
clearly far too bulky to fit into the narrow pore of the tryptase 
tetramer and gain access to one of the active centers. 

CONCLUSION 

In summary, the structure of the )3II-tryptase tetramer has 
been identified based on the four crystallographically inde- 
pendent quasiidentical monomers and the analysis of their 
arrangement within the crystal packing. With its frame-like 
architecture and its active centers facing a narrow central pore, 
the resulting tryptase tetramer structure explains most of the 
distinct properties of the biologically active tryptase tetramer 
in solution. The unusual substrate specificity, with a preference 
for peptidergic substrates, and the resistance to proteinaceous 
inhibitors other than LDTT are both caused by the limited 
accessibility of the active sites within the narrow central pore. 
The tetramer can be stabilized by heparin glycosaminoglycan 
chains larger than «*20 sugar residues, a length required to 
bridge the weaker of the two distinct monomer-monomer 
interfaces. The loss of enzymatic activity on dissociation of the 
tetramer is caused by stabilization by internal molecular 
groups of a zymogen-like rather than the active stale. Finally, 
the knowledge of the structure of the active center of the 
monomer as well as of the distances between neighboring 
active sites allows the rational design of multifunctional inhib- 
itors. Such inhibitors that bind to more than one active center 
will ideally have potentiated affinity, conferring selectivity for 
the tryptase tetramer. Such inhibitors will be valuable as 
pharmacological tools to probe the pathophysiological func- 
tion(s) of tryptases in vivo and may have therapeutic potential 
against asthma and other mast-cell related disorders. 
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15. Host! were mounted in 1.34- and 1.00-mm diamc- 
vcr holes tn white plastic squares (2 by 2 cm). Each 
wasp was allowed to complccc examination and 
ovipositkm. The wasps «%re obsovcd individiuUy 
to prevent r ep eated pansitization of the sante host. 
Triab in which the wasp kft chc host before com- 
pleting oviposition were rtjected. 

16. Atean ± SD was used thnmghout. Statistical signifi- 
cance was determined by t ccsts. 

17. S. E. FJandcn. Ptm-Pae. Entomol. 11. 175 (1935). 

18. Head length was measured from the medial ocellus 
CO the tip of the closed rmndibics by using an ocular 



micrometer. Wasps differed ngni5cantty in mean 
head length between Urge and small treatment 
groiros {P < 0.001). 

19. Sing^ hosts were mounted on white cardboard 
squares (2 by 2 cm) with gum arabic. After host 
examination was completed, wasps were obsctvcd as 
in {IS). 

20. Measurements made from films of the Initial transit 
demonstrate a significant linear relation between 
waq) body length and stride length [slope, 0.58 ± 
0.064 (SE); « • 15. /» < 0.01]. 

- 21. Wasps were observed on single hosts mounted on 



cardboard cards with gum arabic. Only wasps that 
completed their host examination and began ovipos- 
iting were included in the data. For details of 
methods and results, sec ). M. Sdunidt and J. J. B. 
Smidi [/. Exp. Biol. 129, ISl (1987)]. 
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The Three-Dimensional Structure of Asn*^^ Mutant of 
Trypsin: Role of Asp*®^ in Serine Protease Catalysis 



S. Sprang,* T. Standing, R. J. Fletterick, R. M. Stroud, 
J. Finer-Moore, N-H. Xuong, R. Hamlin, W. J. Rutter, 
C. S. Craik 

The structure of the Asn'**^ mutant of trypsin was determined in order to distinguish 
whether the reduced activity of the mutant at neutral results &om an altered active 
site conformation or from an inability to stabilize a positive char^ on the active site 
histidine. The active site structure of the Asn'*" mutant of trypsin is identical to the 
native enzyme with respect to the specificity pocket, the oxyanion hole, and the 
orientation of the nudeophilic serine. The observed decrease in rate results from the 
loss of nudeophilicity of the active site serine. This decreased nudcophilicity may result 
from stabilization of a His'^ tautomer that is unable to accept the serine hydroxyl 
proton. 



THROUGHOUT THE DIVERSE FAMILY 
of serine proteases, the three residues 
implicated in the bond breaking and 
making events of protease catalysis, His^', 
Asp'**^, and Scr"' (chymotrypsin number- 
ing system) arc conserved. The spatial rela- 
tion among dicsc residues is virtually equiv- 
alent in the three-dimensional structures of 
all serine proteases studied. The catalytic 
roles of Ser"^ and His^ arc firmly estab- 
lished (i). The substrate (ester or amide) 
carbonyl carbon undergoes a nucteophilic 
attack by the hydroxyl group of Scr'**, 
which leads to the formation of an acyl 
enzyme intermediate. His" functions as a 
catalytic base by assisting in the transfer of a 
proton from the serine hydroxyl to the 
substrate leaving group. The role of Asp'**^ 
has not yet been defined. The three func- 
tions proposed for this residue arc: (i) stabi- 
lizing the His^' conformation that is re- 
quired for catalysis (2), (ii) stabilizing the 



S. Sprang. T. Standing, R. ). Flcttcridt, R. M. Stroud, J. 
Fuicr-Moore, Deporanent of Biochcnitstry and Biophys- 
ics, Univosity of California, San Francisco, San Francis- 
co, CA 94143. 

N-H. Xuong and R. Hamlin, Department of Physics, 
Univcnity of Caltfomia, San D^cgo, La )oUa, CA 92093. 
W. }. Rutter, HonixHK Research Institute, University of 
California, San Francisco, San Francisco, CA 94143. 
C S. Craik, Dcportmcni of Btochcmistiy and Biophysics 
and Department of Pharmaceutical Chemistry, Universi- 
ty of California, San FrarKisoo, San Francisco, CA 
94143. 

♦Prrjcm address: Howard Hutdtcs Medical Insdtutc, 
Univenity of Texas, Dallas, TX 75235. 



appropriate His tautomer (2), and (iii) 
stabilizing the posidvcly charged hisddinc 
that forms during the reaction (3). The 
proposed functions were tested with a ge- 



netically engineered mutant of the anionic 
isozyme of rat trypsin that was constructed 
by replacing Asp'*" with an asparaginc (4), 
designated here as D 102 N trypsin, where 
D is Asp and N is Asn. 

The activity of D 102 N trypsin has been 
studied as a function of pH {4). The activity 
of this mutant enzyme toward a variety of 
substrates is reduced by four orders of mag- 
nitude relative to trypsin between 7 and 

9, where the latter is optimally active. 
The Michaelis constant, K^^ of the mutant 
enzyme is virtually unaffected {4). This 
raises the quesdon of whether the chemical 
properties of the asparagine itself or the 
conformational differences in the enzyme 
arc responsible for the loss of activity in 
D 102 N trypsin. To address this point, we 
describe the three-dimensional structure of 
D 102 N trypsin at both 6 and 8. 

Orthorhombic crystals (space group 
P2t2i2i) of rat D 102 N trypsin grown at 

6 in the presence of benzamidinc were 




Fig. 1. An a -carbon diagram (stereoscopic) of aniotiic nt D 102 N trypsin at 6 {9-12) (green) is 
superimposed on bovine trypsin (blue). Residues in rat trypsin (72) that differ in sidc*chain type from 
corrcspondingrcsiducs in the bovine sequence (25) arc highlighted in rod here. Side-chain positions for 
residues Asn , His^^, and Scr''^ arc also shown in red. The root-mean-squarc (mis) difference in 
position between corresponding atoms of D 102 N rat trypsin in the crystals grown at 6 and bovine 
trypsin {13, 26) after least-squares superposition is 0.47 A for all main-chain atoms and 0.67 A for all 
side-chain atoms. Vahies quoted arc the average of those obtained for molecules I and 2 in the 
asynunctric unit of the D 102 N trypsin crystals grown at pH 6. The ccxnputcd rms distance may be an 
underestimate of the true differences in the two structures because of the use of bovine trypsin as chc 
initial phasing model. The rms difference after superposition between all atoms of Uk two molecules in 
the asynunctric unit is 0.21 A. The rms deviation between the main -chain atoms of the 6 and 8 
cryscai forms of D 102 N trypsin is 0.25 A. 
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obtained by vapor difiusion <^inst polyeth- 
ylene glycol (Figs. 1 and 2» top). Diffraaion 
data were measured to 2.3 A resolution with 
monochromatic copper Ka radiation and 
the crystal cooled to 4*C on a multiwire area 
dctcaor with the procedures described by 
Xuong ct al. (5) (Tabic 1). A cubic crystal 
form (space group J23) was obtained at pH 
8 by vapor diffusion against magnesium 
sulfate. Diffraction data for this form were 
recorded to 2.8 A resolution with mono- 
chromatic copper Ka radiation on a difirac- 
tomctcr (7) (Table 1 and Fig. 2, middle). 
Both crystal structures were determined by 
molecular replacement methods (5) and re- 
fined by stcreochemicalJy restrained minimi- 
zation of the differences between observed 
and computed structure amplitudes {6, 9- 
12) (Table 1 and Figs. 1 and 2). 

The tertiary structures of the mutant rat 
anionic trypsin at both pW 6 and 8 arc 
essentially identical to that of the bovine 
enzyme (7, 13). The largest differences be- 
tween the enzymes from rat and cow are 
localized to four segments in the NH2 termi- 
nal domain, ail outside the p core, where 
deviations between corresponding main 
chain atoms exceed 1.0 A (Fig. 1). The 
structural similarity between t> 102 N tryp- 



sin and bovine trypsin is quite high in the 
neighborhood of the active site; no signifi- 
cant differences in the relative positions 
(<0.3 A) (Table 2) or relative diamai feccres 
arc observed for Asn'*", Scr***, or the 
oxyanion binding site (i4); that is, the main- 
chain amide gioups of residues 193 and 
195. The only exception occurs in crystals 
grown at 6, where the side chain of 
Flis'' is statistically disordered (Fig. 2, top) 
(11, 12)^ and is partitioned between the 
gauche conformation observed in native 
trypsin and an alternative trans conforma- 
tion, in which the imidazole side chain is 



\ 



I 

\ 



Fig. 2. (Top) The difference Fourier map 
(fob. - fc^c) at the cataiyTic site of D 102 N rat 
trypsin at pH 6. The side-chain atoms of His" 
were omitted from the calculated structure factors 
and phases. The trans and gauche conformations 
of the histidine side chain related by torsional 
differences of 70* are superimposed on the elec- 
tron density. The difference electron density is 
shown at a contour tevcl of 0.2 elearon per cubic 
angstrom. The map extends over all atoms siiown 
in the figure. No negative density is present in this 
region at the 0.2 electron per cubic an^trom 
level. Two lobes of flat, ellipsoidal density arc 
evident, both continuous with the density corre- 
sponding CO the C0 atom of His^^. The peaJcs arc 
of unequal magnitude; the stronger peak is locat- 
ed within the active site between the side chains of 
Asn'** and Scr*'* at a position coincident with 
His^' in the structures of bovine trypsin, and the 
second weaker peak is outside of the active site 
pocket. The shape of both lobes of density and 
their proximity to the CP atom of His" rules out 
the assignment of either peak to ordered solvent. 
(Middle) A difference Fourier map {Fgb* ~~ ^ oic) 
showing the catalytic site of D 102 N trypsin 
firom crystals grown at pH 8. The side-chain 
atoms of His^^were omined from the calculated 
structure factors and phases. At this ^H, only the 
gauche conformcr for His*' is observed in the 
difference electron density. The histidine confor- 
mation is almost identical Co that observed in 
bovine trypsin— bcnzamidinc complex (7). The 
structure of D 102 N trypsin at ^H 8 was 
determined by nwlecular replacement, using the 
refined structure at pH 6 as a search model. The 
side -chain atoms of Asn'°^, His'', and Ser'*' as 
wcU as solvent, bcnzamidinc, and calcium ion 
atoms were omitted from this model. The rota- 
tion funaion produced only one significant peak 
and was evaluated with all data to 2.8 A and an 

po6 



int^ration radius from 4.0 to 16 A. The R factor 
at the correct translation position was 0.35. A 
difference Fourier map computed with phases 
from che molecular replacement solution revealed 
the positions of the omitted side chains, calcium 
ion, and bcnzamidinc molecule. These were in- 
cluded in the phasing model and the structure was 
subjected CO 23 cycles of stercochemically re- 
straiiicd crystallographic refinement (Table 1) (6). 
(Bottom) The bovine trypsin structure (thin 
lines) is superimposed on that of D 102 N rat 
trypsin crystallized at pH 6.0 (thick lines). Both 
conformers of His*^ in D 102 N rat trypsin are 
shown. 



displaced from the active site toward the 
solvent. Only the native gauche His" con- 
formation is obsctvcd in crystals grown at 
8. Unless otherwise stated, all references 
to His'' in the following discussion refer to 
the native conformcr. 

In both the pH 8 and 6 crystal fomis, 
Asn'**^ is superimposable within experimen- 
tal error with Asp***^ of the bovine enzyme 
(Fig. 2). In trypsin, one of the carboxylatc 
oxygen atoms of Asp'*"^ accepts hydrogen 
bonds from the main-chain amide groups of 
residues 56 and 57, and the second accepts 
hydrogen bonds from both the N51 atom of 
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Fig. 3. (A) In the hydrogen bond network found 
in D 102 N trypsin above ncuuaJ pH, His*' is 
unable to accept a proton from Scr"^ 08. The 
orientation of the hydrogen bond between Hb" 
and Scr"^ is the reverse of that observed in the 
bovine tryp$in-bcn2aniiidinc structure (7). (B) In 
the hydrogen bond network of wild- type trypsin. 
His' is an acceptor for the proton from Scr*'. 



Table 1. Crystal and difiiraction data for D 102 N 
trypsin. TTic diffraction data for the crystals grown 
at pH 6 were coUcaed with an area detector, 
whereas the data for the crystals grown at pH 8 
were collected with a diflEractometer. 



Diffraction 
data 



Crystal form 



/»H6 



• /H 8 



Space group 
Cell dimensions 

(A) 



Crystal 4atu 

a 40.4 

b = 92.0 
c = 127.4 
2 



123 

a = 124.4 



Molecules per 
asymmetric unit 

JJiffrfutum data 

Resolution (A) 2.3 

Total observations 90.000 

Unique observations 22,000 

«.ymm* 0.05 

Refinrment remits 

R^ry^tf 0.16 

Resolution (A) 6,0-2.3 

rms difference 0.03 

(bond) (A)t 

mu difference 0.05 

(angle) (A)+ 



2.8 

5,000 

4,500 



0.21 

8.0-2.8 

0.03 

0.05 



•Agreement between symmctry-rebtcd structure-factor 
nugnitudcs R 

R " aa, t{f*) - f«t)/(2*f*) 

where ifk) b the mean structure factor magnitude of the t 
obscrvatium of rcflcctioiu diat arc related to the Bragg 
index h. tAgrccmcnt between the observed (/'obTj 
and calculated (Pcmk) structure factor magnitudes Rcryu 

R^u - i^iif^i - Ifob,!) 

tRoot-mcan-s<)uare deviation between the ideal and 
refined bond ducanccs and angle distances. 



His^ and the O7 atom of Scr^'* (Tabic 2 
and Fig. 3). In D 102 N trypsin, there arc 
two chemically distinct conformations pos- 
sible for Asn'*°. In one of these the N62 
group of Asn*^'^ would be oriented toward 
the main-chain amide groups of residues 56 
and 57. Since the asparagine ami do group 
cannot form a hydrogen bond with the 
main-chain amides in this orientation, they 
couJd approach no closer than the sum of 
their van dcr Waals radii (>3.4 A). 

The alternative conformation is related to 
the first by a rotation of 180" about the CP— 
C7 bond. In this case, the O&l atom of 
asparagine couJd accept hydrogen bonds 
from the main-chain amide groups, whereas 
the N&2 atom could accept hydrogen bonds 
from the His^' imidazole and Ser^ hydrox- 
y\ groups. The two conformations can be 
distinguished by the observed distances be- 
tween the main-chain amides of residues 56 
and 57 and the nearest atom of the Asn'°^ 
side chain. The interatomic distances in the 
present model (25, 16) support the assign- 
ment of the tautomeric form shown in Fig. 
3A. One of the Asn"*^ ami do atoms is 
located 2.6 A from the amide nitrogen of 
residue 56 and 3.1 A from the amide of 
residue 57. This atom of the Asn'°^ side 
chain could then be involved in hydrogen 
bonds with these two amides and would 
thus be identified as 051. Asn'**^ N52 
would therefore be a hydrogen bond donor 
to both die N81 of His^^ and the 05 of 
Scr^'^. Asp'°^ accepts hydrogen bonds from 
both of these residues in bovine trypsin. 

In the profK>scd crystallographic model, 
Asn'*^ can only serve as a hydrogen bond 
donor to His^^; the polarity of the hydrogen 
bond network involving His^, residue 102, 
and Ser''^ is reversed in the mutant enzyme 
with respect to that in bovine trypsin (Fig. 
3). For values of greater than the pK^ of 
the imidazole {K^ is the ionization con- 
stant), the monoprotonatcd tautomer must 
be protonated at Ne2 since it serves as a 



hydrogen bond acceptor from Asn at 
N&l. In contrast to trypsin, the N82 of 



tr*n* 



His^^ in the mutant enzyme is a potential 
hydrogen bond donor to the O7 of Scr**^. 
llius His^^ cannot act as a general base in 
transferring a proton from Ser*'' and this 
probably accounts for the diminished activi- 
ty of D 102 N trypsin near neutral pH. For 
trypsin above neutral ^H, where the enzyme 
becomes active. His is protonated at N81 
(77). Therefore, the presence of a negatively 
charged Asp***^ maintains the un protonated 
Ne2 with a lone pair of electrons as the 
general base catalyst for transfer of the pro- 
ton from the Oy of Scr**^ to the leaving 
group, 

A difference Fourier map (Fig. 2, top) for 
the crystals grown at />H 6 was computed 
with the histidinc omitted from the calculat- 
ed phases and structure factors, revealing 
two sites for the side chain (II, 22). In one 
of these, the Cp-Cv bond is trans to Ca-N, 
and the imidazole is rotated from the cata- 
lytic site. The trans His^ conformer docs 
not form a hydrogen bond with Asn'**' or 
Ser"^ but rather is in contact with a solvent 
water molecule at the surface of the enzyme 
(Tabic 2). The alternative position is nearly 
gauche and similar to the His^' conforma- 
tion in bovine trypsin and D 102 N trypsin 
crystallized at 8 (Fig. 2, bottom). 

Integration of the difference electron den- 
sity indicates that the occupancy ratio of the 
gauche to trans isomers is af^roximatcly 2 
to 1 (Fig. 2, top) (22, 22). A difference map 
computed with phases derived from all of 
the atoms in the refined model reveals resid- 
ual positive electron density in the vicinity of 
the Cel of His^ (gauche), and may corre- 
spond to a partially occupied solvent water 
which is present in the active site pocket 
when His is displaced (trans). 

The displacement of His^^ from the aaive 
site of D 102 N trypsin below neutral pH is 
probably a consequence of steric conflicts 
between the protonated N81 atom of His 
and the proton on the N82 of Asn. D 102 N 
trypsin, like its natural homolog, is crystal- 
lized only in the presence of the substrate 
analog benzamidine, and there arc no appar- 
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Fig. 4. A histogram show- 
ing the x' torsion angles of 
353 htsridir)C5 found in 53 
protein structures refined to 
greater than 2.0 A resolu- 
tion {llf 26). The x' angle 
of 92* zauchc observed in 
His^' of bovine trypsin is 
rare. Angle values are trimo- 
dally distributed about 
+60% 180". and -60". The 
trans conformer that occurs 
at pH 6 in D 102 N rat 
trypsin is more frequently 
observed. 
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Table 2. Conformational and stereochemical data for active site residues in bovine and D 102 N 
trypsins. Values for the two molecules in the asymmetric unit of D 102 N trypsin grown at pH 6 arc 
averaged. Distances are not given for the 2.8 A resolution crystals grown at pH 8. The wild-type 
coordinates arc from the bovine trypsin-bcnzamidinc crystal structure (7). 



Residue 



Atoms 



Conformational 
angles (degrees) 



Asn 



102 



Wild 
type 



Hydrogen bond 
distance (A) 



Asn 



t02 



Wild 
type 



His" (gauche) 
(trans) 

His^' (gauche) 
(trans) 

Scr'" 

His" (gauche) 

His*' (gauche) 

His" (gauche) 

Asn'"/A5p*« 

Asn"«/A5p**" 

Asn"*VAsp»^ 

Ser'" 



N-Ca-CM^T 
Ca-C^-Cy-N81 

N-Ca-Cp-Ov 
N51-Asn/Asp'" N/OS2 

Ne2-Scr*" Oyl 

NtZ-HjO^O 

0«l-Ala»*N 

081-His^^ 

N/0B2-Ser^'*0^ 

O^HjO^'** O 



84 
157 
-96 
-93 
-59 



92 
-100 
-77 



2.8 
3.2 
3.0 
2.6 
3.1 
2.7 
2.9 



2.7 
3.0 

2.9 
2.8 
2.8 
3.0 



cnt sicric conflicts between His^' and other 
residues in the catalytic site. However, even 
in irj'psin, the native gauche conformation 
of His^^ imidazole may be energetically un- 
fiivorcd and require hydrogen bond stabili- 
zation by Asp - A survey of the x* angles 
of His side chains in refined protein struc- 
tures (Fig. 4) shows that the conformation 
found in bovine trypsin is uncommon. Ste- 
ric hindrance arises as a result of dose 
contacts between the and C82 imidazole 
atoms and the main-chain carbonyl carbon 
[contaa distances of 3.0 A and 3.2 A, 
rcspcaivcly, are measured from the coordi- 
nates of bovine trypsin (7)]. Nevertheless, 
His" is well ordered in crystals of native 
trypsin (i3, IT) and criiium exchange mea- 
surements indicate that expulsion of His" 
from the active site pocket occurs in solution 
with a frequency of less than 1 in 50 over the 
^H range 1.5 to 9 {18). Displacement of 
His" from the gauche conformation in ser- 
ine protease crystals has so far been seen to 
occur as a result of stcric conflict in covalcnt 
intermediates formed with certain substrate 
analogs {19, 20) or as a result of the intro- 
duaion of heavy metals into the active site 
{21, 22). In native trypsin, the histidinc 
conformation is stabilized by a hydrogen 
bond between the N81 atom of His and the 
carboxylatc oxygen atom of Asp'**^. 

In D 102 N trypsin, the conformation of 
His^' appears to be linked to its protoruition 
state. In the monoprotonated imidazole tau- 
tomcr that predominates above neutral ^pH, 
the N81 atom of His can accept a hydrogen 
bond from N82 of Asn"*^, Protonation at 
the histidine NBl at the lower pH results in 
the loss of this hydrogen bond and possibly 
also steric conflict with the N&2 of Asn^*". 
The imidazole is then free to rotate to the 
more favored trans conformation, away 
from the catalytic site. Orthorhombic crys- 

po& 



cals of D 102 N trypsin arc grown near the 
pK^ of histidinc, and thus the statistically 
disordered histidinc side chain may reflect 
an equilibrium distribution of mono 
(gauche) and diprotonated (trans) forms of 
tfie His" imidazole. The variant D 102 N 
trypsin is able to react with the active site 
ritrant tosyl-L-Iysine chlorc»ncthyl ketone 
(TLCK) at 20 to 70% of the rate observed 
for trypsin firom pH 7.2 to 8.7 (4), which 
si^csts that as in the pH 8 crystals, a 
substanrial proportion of D 102 N trypsin 
molecules in solurion contain His^' in the 
native gauche conformation. 

As a result of the subsritution of Asn for 
Asp'*'*, the mutant trypsin reacts with diiso- 
propylfluorophosphate (DFP), a reagent 
that specifically titrates the Ser'** nuclco- 
phile, 10"* times more slowly than with 
trypsin {4). The decreased Scr^ nudeophi- 
lidty in D 102 N trypsin probably results 
from the lack of a base in the active site to 
accept the serine hydroxyl proton. His" 
does not act as a base in this mutant because 
it exists in the incorrect tautomcr. While the 
tautomeric form of His" is changed in D 
102 N trypsin, the oxyanion binding site 
(24) — the, main-chain amide groups of resi- 
dues 193 and 195 — is unaltered. The re- 
duced activity of the mutant thus gives an 
upper limit to the contribution of transition 
state binding alone to the reaction rate. 
Trypsin normally accelerates the rate of DFP 
hydrolysis by a factor of 10* {20). Our 
results suggest that a faaor of 10^ in rate 
enhancement may derive fi'om the stabiliza- 
tion and orientation of the lone pair on the 
Ne2 atom of His^^. The remaining factor of 
10^ can presumably be ascribed to orienta- 
tion of the nucleophilc (Scr'*') and transi- 
tion state binding. Under alkaline condi- 
tions (pH > 10), the rate of catalysis by the 
mutant approaches 10% of that of the native 



cnzynK (4) through an altered mechanism 
in which base catalysis appears to be provid- 
ed by solvent hydroxide. In trypsinogen, the 
situation is reversed; His" is correctly ori- 
ented, but the oxyanion binding site is not 
propcriy formed to stabilize the transition 
state (2i), even after irreversible binding of 
the transition state analog DFP (23). The 
reaction rate toward DFP is also reduced by 
a factor of -^10* relative to trypsin {20), 
which again ascribes an upper limit of 10^ 
rate acceleration to transition state binding. 
Catalytic rate enhancement by serine prote- 
ases is thus partitioned almost equally be- 
tween (i) orientation and stabilization of the 
enzyme base His^^ and (ii) the correctly 
oriented serine nudcophilc and transition 
state binding site. Studies of D 102 N 
trypsin indicate that the Asp'**^ residue plays 
a critical role in the first of these processes, 
perhaps electronically with His" (24), and 
structurally, by providing hydrogen bond 
stabilization of the funaional tautomer and 
thus maintaining its correct orientation 
within the catalytic site. 
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S£RINB PROTEASES FUNCTION IN 
many biological systems to hydrolyzc 
specific polypeptide bonds. Trypsin, a 
wcll-studicd member of this family, cata- 
lyzes the hydrolysis of peptide and ester 
substrates that contain lysyl or arginyl side 
chains* Serine proteases have the triad of 
residues Asp' , His^, and Scr"^ at the 
active site (chymotrypsin numbering sys- 
tem). X-ray crystaliographic studies reveal 
that these three residues arc in close proxim- 
ity, which suggests they may serve as a 
fiincrional interacting unit responsible for 
bond formation and cleavage during cataly- 
sis (i). Numerous chemical and physical 
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Studies indicate that Scz^ and His^ play 
crucial roles in catalysis. For example, selec- 
tive reaction of Ser*^^ with diisopropylfluor- 



C. S. Oaik, Departments of Pharmaceutical Chemistry 
and of Biochemistry and Biophysics, University of Cali- 
fornia. San Francisco. San Frandsco, CA 94143-0446. 
S. Roczruak, C. Largman. W. J. Rurter, HormorK: 
Research Institute and Demrtment of Biochemistry and 
Biophysics, University of California, San Francisco, San 
Francisco, CA 94143-0448. 



♦Present address: NutraSwccr Company, Mount Pros- 
pect, IL 60056. 

T Present address; Veterans Admin istrauon Hospital, 
Martinez, CA 94553, and Departments of Internal 
Medicine and Bintoeical Chemistry, University of Cali- 
fornia, Davis. CA 95i616. 



The Catalytic Role of the Active Site Aspartic Acid in 
Serine Proteases 

Charles S. Ckaik, Steven Rocznlak,* Corey Largman,! 
William J. Rutter 



The role of the aspartic acid residue in the serine protease catalytic triad Asp, His, and 
Scr has been tested by replacing Asp'**^ of trypsin with Asn by site-directed mutagene- 
sis. Itic naturally occurring and mutant enzymes were produced in a hcteiolog<ms 
expression system, purified to homogeneity, and characterized. At neutral pH the 
mutant enzyme activity with an ester substrate and with the Scr'*'-spccific reagent 
diisopropylfluorophosphate is approximately 10^ dmcs less than that of the unmodi- 
fied enzyme. In contrast to the di^madc loss in reactivity of Scr^'^, the mutant trypsin 
reacts with the His"-spccific reagent, tosyl-L-lysine dUoromethylkctonc, only five 
times less efficiently than the unmodified enzyme. Thus, the ability of His'^ to react 
with this affinity label is not severely compromised. The catalytic activity of the mutant 
enzyme increases with increasing so that at 10.2 the k^t is 6 percent that of 
trypsin. Kinetic analysis of this novel activity suggests this is due in part to participa- 
tion of either a titratable base or of hydroxide ion in the catalytic mechanism. By 
demonstrating the importance of* the aspartate residue Ln catalysis, especiaUy at 
physiological pH, these experiments provide a rationalization for the evolutionary 
conservation of the catalytic triad. 
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A Novel Low-Density Liipoprotein Receptor-Related Protein with Type r 



II Membrane Protein-Like Structure Is Abundant in Heart^ 

Yasuhiro Tomita, Dong- Ho Kim, Kenta Magoori, Takahiro Fujino, and 
Tokuo T. Yamamoto' 

Tohoku University Gene Research Center, Sendai 981-8555 
Received for publication. May 21, 1998 

We report herein the identification of a novel member of the low-density lipoprotein 
receptor (LDLR) family termed LDLR-related protein 4 (LRP4), Murine LRP4 cDNA 
encodes a 1113-amino-acid type II membrane-like protein with eight ligand-binding 
repeats in two clusters. Southern blot analysis of genomic DNA from several different 
organisms suggests the presence of LRP4 homologues in chicken lacking the gene encoding 
apolipoprotein E, which is recognized by the ligand-binding repeats of LDLR. LRP4 
transcripts were detected almost exclusively in heart in mouse and humans. Despite the 
presence of the ligand-binding repeats, COS cells transfected with LRP4 did not show 
surface-binding of ^-migrating very-low-density lipoprotein, suggesting that LRP4 plays 
a role in a pathway other than lipoprotein metabolism. 

Key words: LDL receptor family, LDL receptor related protein, membrane protein, 
receptor. 
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The low;density lipoprotein receptor (LDLR) family is a 
growing' super gene family that includes LDLR itself (2), 
apolipoprotein E (apoE) receptor 2 (apoER2) {2, 3), very- 
low-density lipoprotein receptor (VLDLR) (4, 5), insect 
vitellogenin receptors (6, 7), LDLR-related protein/ ara- 
macroglobulin receptor (LRPl) (S), a kidney autoantigen 
gp330/megalin (LRP2) (9, 10), and a recently identified 
member termed LDLR relative with 11 binding repeats 
(LRll/sorLAl) (11, 12). All members of this gene family 
contain the following five structural motifs: (i) comple- 
ment-type cysteine -rich repeats, termed LDLR ligand- 
binding repeats or LDLR class A repeats; (ii) cysteine- rich 
epidermal growth factor (EGF) precursor- type repeats, 
termed growth factor repeats or LDLR class B repeats; (iii) 
cysteine- poor spacer regions, with five copies of the se- 
quence YWTD, separating the growth-factor repeats; (iv) a 
single membrane- spanning region; and (v). a cytoplasmic 
region with at least one copy of the "NPXY" internalization 
signal. LDLR is the best characterized protein in this 
superfamily and the relationship between structure and 
function for each module of LDLR has been elucidated by 
analysis of mutations in patients with familial hypercholes- 
terolemia {13, 14). 

* This work was supported by the Japan Society for the Promotion of 
Science Grant RFTP97L00803. Sequence data from this article have 
been deposited with the EMBL/GeneBank Data Libraries under 
accession No. AB013874. 

'To whom correspondence should be addressed: Fax: -I-81-22-263- 
9295, E-mail: yama@ biochem. tohoku. acjp 

Abbreviations: apoE, apolipoprotein E: apo£R2, apolipwprotein. E 
receptor 2; LDLR, low-density lipoprotein receptor; LRP. low-den- 
sity lipoprotein receptor- related protein; VLDLR, very. low- density 
lipoprotein receptor; >?*VLDL, >9- migrating very- low -density lipo- 
protein. 

(O 1998 by The Japanese Biochemical Society. 



Among members of the LDLR family, VLDLR and 
apoER2 most closely resemble LDLR in structure and, like 
LDLR, bind apoE-rich /J-VLDL with high affinity {2-4). In 
the chicken, VLDLR is expressed almost exclusively in 
oocytes and mediates uptake of yolk precursors, VLDL and 
vitellogenin (25). This receptor- mediated process is criti- 
cal in non- mammalian vertebrate oogenesis: female chick- 
en mutants lacking VLDLR are sterile {16). In contrast to 
the chicken, mammalian VLDLR mRNA is abundant in 
heart, skeletal muscle, brain, and adipose tissues (4). 
Frykman et al. have shown that mice lacking VLDLR 
exhibit modest decreases in body weight, body mass index, 
and adipose tissue mass, while their plasma cholestei'ol 
levels, triacylglycerol levels, and lipoprotein profiles are 
not altered {17), Furthexmore, knockout mice lacking both 
VLDLR and LDLR exhibit a modest hypercholesterolemia 
{17), whereas apoE knockout mice exhibit a profound 
hypercholesterolemia {1$). These data suggest the pres- 
ence of other apoE receptors. 

To extend our studies on receptors that may play a role 
in the clearance of apoE- containing lipoproteins from the 
circulation, we have been characterizing cDNAs belonging 
to the LDLR superfamily. In the previous study, we have 
characterized a new LDLR-related protein termed LRP3 
{19). Human and rat LRP3 consist of a 770-amino-acid 
type I membrane protein with the following regions: a 
putative signal sequence; two isoleucine/leucine/ valine- 
rich regions with an RGD sequence; two ligand-binding 
repeat regions; a putative transmembrane region; and a 
proline -rich cytoplasmic region with a tyrosine -based 
internalization signal. Despite the presence of the ligand- . 
binding repeats, CHO cells transfected with LRP3 failed to 
bind >?-VLDL. 

In this study, we have isolated a near full-length cDNA 
encoding a new member of the LDLR family, termed 
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- 1 • 0 CCCCTCAa rcC IOCmACTTC*CACTC*AUl 11 I I 1 1 C I ecu 1 UTCC C I ICC I LCI I IU ^TCCCCTCCCACCTCCTTCCCCTCC(rrACCTCAOTG»CCACA£U^CT^ 

rrCCTCTTCT 



-60 CTI 

1 



reCCTCACTCCTO ia*CTTOXCrCCG»C»CCCTGCC»CTCATtXCC*«KTTTCeTT^ 

HcnvsrsvttvssvRRAii 



61 CCT GCaca fcTgTAU : I C JU. UfCA CACT CCC I C CA ACCACCCCCCTOCCTCCJtCTCJU^C Gtt lCT T GCC T GC C C CCCSCTTCC<aHgSG»C» ^ ^ 

21 PGft<C>r l.S©av» PTTALRJitH. GLO0ACVPCtTACCAVCPG P 



lei rrccocACOCCi 



:CGCTCCAACTTCCACCCTCCCCCCACCTGCAACCA1 



^TTgC T T T CC A0 CCCCC CC1 

SI LCTRC rtSCSX rOA PCSWRO^rCAPPAP OV LR'AORS VG CC 



)0l TCTCCTCAgAAC C T CC TCACTGCTJ 
101 (C) POXLVTAN 



rcCTTTCTCCGAACATTAAAA 
]s r V C T L K 



* 2 1 ACCCrrr ATTTCAAATCAAATCACACTGMCCTTTGCTCACTCATCCCGAAGCTCCACTCCCTGCTCTTATTCCTC^ 

m R V. T r K 8 H D S CPLVTDGCARVPCVI PVHTVYVCHTCAPSIP 

»« 1 CCCAGCCACTCCACTCCACCCTWCAC C GAGACCTCCTTCTCCASACCfcCCAQiCTCA^^ 

101 PSOSTPAVTPRAPSPCDQSHR W T S T © H MIT HSC©OI LPYH 

C6t ACCAeGTTCGCACCTCTCTTCCCAATTGTCAAAAACATCGACATCCAGAACTTCCTCAACTTCTTCACCTACCTCCATCCCCTC^ 

221 STLAPLL»rVKNHOMCKrLKrrTTLHRl.S©YOH 1 LLPC(£>S 

781 CTCOCCT TCCCTGACTSCCTTCTTCATCCCCATGA C ACCCATCCTCT TCTACCCTCTACAT C T T T Jt TCACCCTCCAAAASAAGCATCCGAATC TCTCCTCCCIVATCCTCAACTCCTCC 

261 LA rp e(C>VVDCODRHGL L P©RS r©tAAKCC©C5VLCHVNS S 

POl TGCCCCGATTCCrTCAGArSCTCTCACTTTACCGACCACArrGAGACTAACACCACTCTC^^ 

301 MP DS Lft©SOrRDHTCTNS5VRKS©rSLOOCHCKOS L © C C C 



1021 GAaACCTTCCTCTCTACCACCGCCCTCTCCGTCCCCAACAACCTGCACT^TAACGCCTATAATCACT^ 
341 C S 



L © T 



L © 



LO©HGYNO©D 



JU»;aACtLAGCCC(:JiTTgaU^CTSp^CCAACCA 
I S D el AH©H©SKOLrH 



1141 TgTCCCACACCCAACT^CTCCACTACACCCTCTTCnrCTGATGCCTACGATCACTCTCCGGACCC GACTCACGJ^^ 
391 ©C T G K©LHY S L L©OG T D 0©C 0 P | 5 D C j ON©D©|i L T K C H R©C D 



1261 CCCCCCTgCATTCCCCCTGACTCCCTCTgCCATCGGCACCATGACTGTCTGOACAA STCTCATCACG TCA 

AEIIV<c)dGDHO©VOK [ S D t 1 VN©9©HSOCL 



421 C n ©^I A 



CTGSTCCAATJ^ptCAAGTGCACAGTJU; 
VE©TaCO© 



1381 ATCCCTACCACCTTCCACTgraATCGCCACCAAaACTCTAACCATCGCAGTGJ^ 
461 tPSTrQ©DCDCD©KDG t S O E I CH©S DSQT P©PCCCOG©rc S 5 

1501 TCCGTCCAATCCTCTCCTCCTACCTCTCTCTgTCaCTCACA CACC^CCCTG ^ 
501 ©V e s(c)aGS5 L©DS D \ S S L ] S H©S0©E P I TLE L©MNLLYN H T H 

1 62 1 TATCCAAATTACCTTOCCCACACAACTCAAAATCAAC C CTCCATCACCTtaaaun^TCCCTirTCCCTCCCCTTCTA^ 
541 YPHTLCKRTQKCAS I SHESS trPALVOTW©YKV LMPPA©! 

1 74 1 ATTTTCCTTCCAAACTGTt^TCTGAATACAGCACAACCCATCCCCCCTTCCACACTCCTCTCTGACCACTCCAAACACCCCTCTCACTCTCTTC^ 
581 I LVPK^DVNTCOn I P P©RLL©EHSKeR©C 3Vt3 I VCIQM P 

1661 GAACACACCCACTGCAATCAATTTCCAaACGAAACTTCAGACAATCAAACrTCCCTCCTCCCCAAT(^^ 
62 1 E D T O©H0 FPE E S S DNOT©L L.PME OVE C©S P 5 H rK(c)R S C R© 

1 98 1 CTTCTCCGCTCCACGACATCTCACCGCCACCCTCACTCTGACGACCJtf yrrGACC^^ 
661 VLGSnR©O CQAD©OOD | 3 D C | E N©G©K C RA. L W C©P FN KO(£)l K 

2101 CATACATTAATCTCCCATGGCTTTCCACATTCTCCACACACTAT Ct^ 
701 HTLt©DGrP0©PO S | M D C 1 K H©S F©ODN E L E©AKH E©V p R O L 

222 1 TCGTGCGACCGATGGGTCCACTGCTCACACAC TTCTGATCAA TCCGGCTCTCTGACCCTCTCTAAAAATCCGA^ 
741 »*©DG W V 0©S D 5 j S D e") HC©VTLSKNG H 3 8 SLLTVKKSAKCKH 

2341 GrCTCTCCTGACGCCTCCCCCCAaACCTTGAGTCAGCTMKCTgOUWCACA 
701 V<c)ADCI«RETlS0LA©K0MGLCePSV7KL I PGOEGQOMLR L 

24 61 TACCCCAACTCCCACAATrTCAATCCGACCACCTTGCACGACCTGCTCGTATACACCCACTCCTCCCCAACCAGAASTGAGATTTCCCTTCTGTGtTCCAAGC^^ 
S21 YPMWEKL H C S TLQELLVYRHS©PSRSEI 5LL©Sk'oO©GRP 

2 591 CCTCCTCCCCGAATGAACAAGACGArCCTTGGCCCTCCGACTACTCCTCCTCCGACCTCCCCCTCGCAGTgCTCTCTGCAOACTC 
961 PAARMNKR I LCCRT5RPCRWPW<}©5 1?5 C PSCH I©0©VL t 

2701 CCCAACAACTtXWTCCTGACACTTCCCCATTCCTTTCAACCGAGAGAACACCCTGATCTTTCCAAACTCGTArTTGSCATAAACA^ 
901 A KKHVL TVAH©rECRE OAOVWKVvrG 1 NH L DHPSC TMOT R 

2 B 2 1 TTTCTCAACACCATCCTCCTACATCCCCCrrACACTCCACCAGTCGTAGACTATGATATCACCCTCGTCGACCTGAGCGATGATATCAATCAGAC*^ 
041 rVKTI LLNPRYSRAVVDYOISVVCLSDDI NET 5YVRpV©L 

294 1 CCCACTCCGGACGACTATCTAG*ACCAGATACCTAC7^TACArCACACGCTGCGG C CACATXXaXJU^TA^ 
991 PSPEeyLCP0TY©YITCWCHHCNKM7rKL0eGCVRI IPLE 

3061 CAOTGCCACTCCTATTTTCACATGAArUCCATCACCAATCCGATCATCISTGCTGCCTATCyUJT 

1021 0(C)0S Y rOMKT 1 TMRM I©AC Y E 5CTVDS©MG DS3G P LV©C R 

3 1 9 > CCCCCACGACACTCGACATTATTTGCTTTAACTTCATCCCCCTCCCTC'WTTrrCCAAACTTCTCGCACCTCGACTGTACACOWTCT^ 

1061 PCCQHT l.rGLT8WCSV©PSKVLGPCVYS MVS YTVGW I CRQ 

3301 ATATATATCCACACCTrrCTCCAAAACAAATCCCAACGATAATCAGACACTTTCTCGCCAAACCTACATCSAGAATGACCCTCT^^ gTCCTCCCAAGACCTCTACGAAC 
1101 lYtQirLOKXSOC* 

3421 ACGCCTTTCACCGACAeCACCCTCAACATCCACCCCAACATCTCTCCTCTTTCTOCTA£»TGAGrrr 

354 1 TTTAAAACCACACACCAAACTACCTTTTGTTATTTTCCTACCCTAACCTTGAATtrrACTCTtKAATTACa^ 

36 i 1 TrrTATTACTACTACAACACACKCACCCACATACACCCTGACTGATCTCCAGTTTCTGCrTAACCCCACTCGCTTJ^ 

3791 CTAa^AACCCAAaUSAATATATATGCTTTTATTATTTACTCTACTCTTCTAAATAACTTGAAGAAATXJtTG^^ 

390 1 ACAATGTAAAArrCTCTACCCAACCAAACTAACACTCTGAACTAACTACAATTCTATCCTTTCYGTATTCAAATTAAGCTTAAAATCTCCJ^^ 

4021 CCCACTATCTCAerrTAflATaACTCTGATCTCAAAAC CC AC C TCAATCCTTGAGCAAATAATTTCTTTCCTTATGTCGGAATGAATAAC^ 

4141 AAACCACAAAAAAAATTAAATAACATTCCACACCCAATTAATTCTGAAAATTAC I C T OC I H# TATTCACCCAAAACACAAAACTTACACAAAYATATTTCAAACTCCACCAAAATCTTCC 

4261 ATCGAGTATATAACATrrTCCAATTTCCCCCTCATCATCTCTAACATCCGCTATTCCCAriTGCCTCAr^ 

4 3 B 1 TCCCMTCAATTCCCAAACCAArr ACTCGTTACAACYATTTTTTCCCACTAAAAACTTTCAAAACACAAAC^^ TACCCASACAYGAACTATtrrAACATCCAAATC 

4 50 1 CCTTTTTGAACAACTACCArCCACTG t TAAACTTCRCCACCAACCAAACTGCCTCACTATTCCTTACACCGACTACCTGCAATTTTATATCTGTArrr^ I I iL TACATACTT 

4 62 1 caaatccaaaacattctttcaacccctattctccatcttcttcacctcttctcctcgaatttcttaca;^^ 

♦ 7 4 1 GACCATCGCCTCC Ul M l Ul lATAATTCTT GOCA CATAATTAATAAAATATTTTTACCATTCCC I Al n 
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Fig. 1. Nucleotide and deduced amino acid sequence of murine 
LRP4 cDNA. Nucleotide and amino acid residues are numbered on 
the left. Nucleotide 1 is the A of the initiator AUG oodon. Negative 
numbers refer to the 6' -untranslated region. Two in- frame translation 
termination codons at —87 and 3342 are indicated by asterisks. The 
putative transmembrane region is boxed in black. Cysteine residues 
are circled and the ligand 'binding motif SDE and similar sequences 
are boxed. Potential N- linked glycosylation sites are underlined and 
a potential polyadenylation signal is doubly underlined. 



LDLR-related protein 4 (LRP4) and describe here the 
molecular characterization of this new receptor- like pro- 
tein. 

MATERIALS AND METHODS 

Standard Procedures — Standard molecular biology tech- 
niques were carried out essentially as described by Sam- 
brook et al (20). Nucleotide sequencing was performed by 
the dideoxy-chain termination method (21) using M13 
primers, T3 and T7, or specific internal primers. Sequence 
reactions were carried out using Taq DNA polymerase with 
fluorescently labeled nucleotides on an Applied Biosystems 
Model 373A DNA sequencer. To analyze RN A in murine 
and human tissues, commercially available Northern blots 
(Clontech) were used for Northern blot analysis. 

cDNA Cloning — A murine heart cDNA library was 
constructed in pBluescript vector using poly(A) RNA and 
the cDNA synthesis kit from Pharmacia. The library was 
screened with a mixture of degenerative oligonucleotides 
corresponding to a highly conserved amino acid sequence, 
WHCDGD, among the ligEind-binding domains of LDLR, 
VLDLR, andapoER2: 5'-TGG(A/C)G(A/C/G/T)TG(C/T). 
GA(C/T)GG(A/C/G/T)GA-3'. Positive clones hybridizing 
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with the oligonucleotide probe were the reprobed with 
LDLR and VLDLR probes to eliminate cDNAs for these 
receptors. By screening 5 X 10' clones, we obtained one 
positive clone that hybridized with the oligonucleotide 
probe alone. 

'Zoo" Southern Blot Analysis — Genomic DNAs (10 ^g) 
prepared from a normal man, a male BALB/c mouse, a 
white Leghorn hen, and a female Xenopus laevis were 
digested with a large excess of EcoBl for electrophoresis in 
a 0.8% agarose gel, then transferred onto a nylon mem- 
brane. The membrane was hybridized with the entire 
region of murine LRP4 cDNA. Hybridization was at 42*C in 
5XSSC. SxDenhardt's solution, 200^g/ml denatured 
salmon sperm DNA, 50% (v/v) formamide, and 1% (w/v) 
SDS. The blot was then washed twice with 0.3 x SSC and 
1% (w/v) SDS at 60*C, followed by autoradiography. 

Expression of LRP4 cDNA in COS-? Cells—To con- 
struct an LRP4 expression plasmid (pLRP4-SRar), the 
entire coding region of murine LRP4 cDNA was inserted 
into an expression vector (pcDL-SR<3r296) (22) by multiple 
ligations of restriction fragments. The expression plasmid 
was transfected into COS -7 cells according to the trans fee- 
tion protocol described by Chen and Okayama (23). 

Lipoprotein Binding Assay — Rabbit >ff-VLDL {d 1.006 
g/ml) was prepared from the plasma of 1% cholesterol- fed 
animals (24), *"I-labeled >5-VLDL was prepared (25) and 
its binding by the transfected cells was assayed according to 
the procedure described previously (2). 

RESULTS 

Isolation and Characterization of Murine LRP4 cDNA — 
A near full-length cDNA encoding a new member of the 
LDLR family, designated LDLR-related protein 4 (LRP4), 
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Fig. 2. Functional regions in LRP4. (A) Hydropathy plot analysis 
of the murine LRP4 protein. The numbers on the z-axis correspond to 
the positions of the amino acid residues in the protein. The putative 
tronamembrane (TM) region is shown by a thick line. (B) Comparison 
of the amino acids in the eight Ligand -binding repeats of murine LRP4 . 
Amino add alignment was optimized and gaps were introduced to 



match the six cysteine residues in each repeat. Amino acid residues 
conserved in more than 50% of the repeats are boxed and sho\^ii below 
as a consensus sequence. The consensus sequence of the ligand-binding 
repeats of human LDLR (i) is also represented. (C) Schematic 
representation of LRPsl-4. apoER2. LDLR. and VLDLR. 
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\u)'P^^^^ lipoprotein Receptor- Related Protein 4 

isolated from a murine heart cDNA library by using a 
''fixture of degenerative oligonucleotides corresponding to 
;tbe higWy conserved amino acid sequence WRCDGD 
^^ong the ligand- binding domains of LDLR, VX.DLR, and 
i^7^poER2. Figure 1 shows the nucleotide and deduced amino 
"^Wds sequences of the cDNA, which has an open reading 
%frexx^^ of 3,339 bp corresponding of 1,113 amino acids with 
§j* calculated molecular mass of approximately 123 kDa. 
^^ff^e putative initial methionine was preceded by an in- 
^^^^me ' oi-mination codon present 87 nucleotides upstream. 
=HrA hydropathy plot [26) of the deduced amino acid 



-f^quence of murine LRP4 shows the presence of a hydro- 
S^lphobic region at amino acid residues 113-133 (boxed in 
black in Fig. 1 and identified with thick lines in Fig. 2A). 
^iXhis hydrophobic sequence of 21 amino acids strongly 
resembles the transmembrane region of membrane pro- 
l^'.teins, being flanked by a positively charged amino acid 
.^i(arginine) on the N- terminal side. This structural feature 
suggests that LRP4 has a type II transmembrane protein 
struct i:vc (amino terminus in the cytosol). 
The terminal side of the putative transmembrane 
"t^ domain contains two clusters of cysteine- rich repeats that 
~r resemble the ligand binding repeats (class A motifs) of 
J] " LDLR: one cluster contains three repeats and the other has 
J five (Fig. 2, B and C). Each repeat has six completely 
T,. conserved cysteines and a highly conserved C-terminal 
SDE tripeptide, which forms a part of the ligand- binding 
\r.8ite of LDLR (Fig. 2B). Unlike LDLR, VLDLR. apoER2. 
.p/LRPl, and LRP2. there are neither YWTD repeats nor 
growt^^ *'actor repeats (class B motifs) in the murine LRP4 
T= eequc . (Fig. 2C). 

-v- The cytoplasmic domains of LDLR, VLDLR, apoER2, 
' - . LRPl, and LRP2 contain one or two copies of a highly 
— conserved coated pit signal, FXNPXY (23). In the putative 
cytoplasmic region (N-terminus), we found neither a 
; typical FXNPXY sequence nor a similar tyrosine- based 
sequence (27). Further studies are required to determine 
whether LRP4 may function as an endocytic receptor. 
7 Southern Blot Analysis of the LRP4 Genes in Various 
Spec/*?.?— To test the possibility that LRP4 homologue 
[ genes -ht also be present in nonmammalian vertebrates 
* (known CO lack the apoE gene), Southern blot analysis of 
genomic DNA from several different organisms was carried 
' y-.out. This *'zoo blot** (containing DNAs of humans, mouse, 
•^ chicken, and frog) was hybridized with the entire coding 
region of the murine cDNA under relatively stringent 
J^^nditions (see "MATERIALS AND METHODS"). As shown in 
rf>5:;Fig, 3^ intense hybridization signals are present in mouse. 
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and fainter but significant signals can also be detected in 
human and chicken DNAs. These data suggest the presence 
of LRP4 homologues in chicken lacking the gene encoding 
apoE, which is recognized by the ligand- binding repeats of 
mammalian LDLR, VLDLR, and apoER2. 

Expression of LRP4 TVonscnpte— Northern blot analy- 
sis of RNA from various murine tissues revealed hybridiza- 
tion of the LRP4 probe to a major transcript of 5.0 kb in 
mouse, with the highest expression in heart, relatively high 
levels in testis, and much lower levels in kidney and lung 
(Fig. 4A). Figure 4B shows a blot hybridization of RNA 
from various human tissues probed with the murine cDNA. 
In human tissues, major transcripts of 5, 2.6, and 2.3 kb 
and a minor transcript of 4 kb are detected almost exclu- 
sively in heart. A fainter but significant signal of 2 kb can 
also be detected in skeletal muscle and testis. The tran- 
scripts of 2.0, 2.3, 2.6. and 4 kb detected in human tissues 
may be a consequence of alternative splicing. 

/5'VLDL Binding—To test the possibility that LRP4 
might bind apoE-rich >?-VXDL (as do LDLR, VLDLR, and 
apoER2), an expression plasmid containing the entire 
coding region of murine LRP4 cDNA was constructed and 
introduced into COS- 7 cells i and ligand- binding activity 
was measured using "*I-labeled VLDL. As shown in Fig. 




Fig. 3. Genomic Southern blot analysis of LRP4-related se* 
quences in various eukaryotic species. A blot containing 10 ^g of 
£co Rl 'digested DNA from the indicated species was hybridized with 
the entire coding region of murine LRP-l cDNA under the conditions 
described in "MATERIALS AND METHODS" and exposed to Kodak 
XAR-5 film with an intensifying screen at — 80'C for 16 h. 
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Fig. 4. Expression of I4RP4 tran- 
scripts in mouse (A) and humans <B). 
Poly(A) RNA {2 ^g) from the indicated 
murine (A) and human (B) tissues was 
probed with 'T*. labeled murine LRP4 
cONA. The filters were exposed to Kodak 
XAR-5 film with an intensifying screen 
at -80'C for 14 h. Control hybridization 
with a rat glyceraldebyde- 3 -phosphate 
dehydrogenase (GAPDH) is shown in the 
lower portion. 
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Fig. 6. Transient expression of LtRP4 
in COS cells. (A) Surface binding of "M- 
labeled >9-VLDL. COS cells transfected 
with an expression plasmtd encoding 
murine LRP4 (pLRP4-SRa). human 
apo£R2 (pNRl), or the parental vector of 
pLRP4-SRdr (pcDL-SRtf296) were in- 
cubated for 2 h at 4'C with the indicated 
concentrations of ^**I->9-VU>L (540 cpm/ 
ng), after which the values for surface- 
bound >9-VLDL were determined as 
d escri bed under ^MATERIALS AND 
METHODS.' (B) Northern blot analysis 
of UIP4 transcripts in COS cells trans- 
fected with murine LRP4 expression 
plasmid (LRP4). or the parental vector 
(pcDL.SRa296). Total RNA (lO>/g) 
from the indicated transfected cells was 
probed with 'H*- labeled murine LRP4 
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cDNA. The filter was exposed to Kodak XAR-5 film with an intensifying screen at — 80'C for 12 h. 



5A, the level of surface bound >5-VLDL in LRP4- trans- 
fected cells was similar to those in cells transfected with 
equal amounts of the parental vector, despite the high 
levels of accumulation of 3.0-kb LHP4 mRNA (lacking 
approximately 2.0 kb in the 3'- untranslated region) in the 
LRP4-transfect€d cells (Fig. 5B). In control experiments, 
marked induction of '**I->5-VLDL binding was observed in 
cells transfected with human apoER2. 

DISCUSSION 

In the present study, we have shown the structure and 
expression of a novel member of the LDLR family termed 
LRP4. The most interesting feature of LRP4 is that, unlike 
other members of the LDLR family, this protein has a type 
II membrane protein- like structxire. The hydropathy plot 
analysis shows the presence of a hydrophobic region at 
amino acid residues 113-133 of murine LRP4. There are 
eight ligand -binding repeats clustered into two regions in 
the C- terminal side of this putative transmembrane region. 
Based on the presence of ligand- binding repeats in the 
extracellular regions of other LDLR family members, it 
seems reasonable to predict that the C-terminal side of the 
putative transmembrane region constitutes the extracel- 
lular region of the protein. 

Despite the presence of eight ligand- binding repeats. 
COS cells transfected with LRP4 failed to bind /5.VLDL. 
suggesting that LRP4 does not function in lipoprotein 
metabolism. Of the four clusters of ligand* binding repeats 
in LRP2, the recognition site for apo£ has been mapped to 
the second cluster (28), This suggests that these clusters 
are not functionally equal, despite their structural similar- 
ity. Therefore, the ligand -binding repeats in LRP4 may be 
fimctionally different from those in other family members 
that bind >^-VLDL. 

Although the exact function and ligands of LRP4 remain 
unclear, the abundant expression of LRP4 transcripts in 
heart is noteworthy. Based on the structural features of 
LRP4 and its almost exclusive expression in the heart. 
LRP4 may play a role as a surface receptor that is related 
to cardiac function. Further studies are necessary to 
elucidate the exact role of this structurally interesting 
molecule. 
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Hepsin, a putative membrane-bound serine protease, 
was originally identified as a human liver cDNA clone 
(Leytus, S. P., Loeb, K. R., Hagen, F. S., Kurachi, K., 
and Davie, E. W. (1988) Biochemistry 27, 1067- 
1074). In the present study the human hepsin gene was 
localized to chromosome 19 at ql 1-13.2. The messen- 
ger RNA of hepsin is 1.85 kilobases in size and present 
in most tissues, with the highest level in liver. Hepsin 
is synthesized as a single polypeptide chain, and its 
mature form of 61 kDa was found in various mamma- 
lian cells including HepG2 cells and baby hamster kid- 
ney cells. It is present in the plasma- membrane in a 
molecular orientation of type II membrane-associated 
proteins, with its catalytic subunit (carboxyl-terminal 
half) at the cell surface, and its amino terminus facing 
the cytosol. Hepsin is found neither in cytosol nor in 
culture media. The results obtained suggest that hepsin 
h£U3 an important role(s) in cell growth and function. 



Proteases play important roles in a number of physiological 
and pathological processes such as protein catabolism, blood 
coagulation, fibrinolysis, and in the complement system (1- 
3). The importance of proteases in many phenomena includ- 
ing cell proliferation, inflammation, development, tumor 
growth, and metastasis are also well described. Their involve- 
ment in carcinogenesis as well as in cell growth is further 
supported by the anticarcinogenic and anti-cell growth effects 
of protease inhibitors (4. 5). Most of these are non-membrane 
bound intra- or extracellular proteases. Recently, several 
membrane-associated proteases have been described. A cell 
surface protease with molecular weight of 67,000 has been 
reported (5-7). This protease, which is inhibited by ai -anti- 
trypsin (5), was found to be essential for cell proliferation and 
was suggested to be involved in various biological processes 
of cells, in addition to the degradation of extracellular matrix 
proteins. Guanidinobenzoatase, which can cleave flbronectin 
at Gly-Arg-Gly-Asp, the sequence involved in the attachment 
of flbronectin to cell surfaces, has been described (8-10). This 
protease is located on the surface of most tumor cells, as well 
as in the fluid surrounding tumor cells. A fluorescent compet- 

* This work was supported in part by National Institutes of Health 
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itive inhibitor has also been used to localize this protease on 
the tumor cell surface (9). A trypsin-like membrane-associ- 
ated protease of an estimated molecular weight of 120,000 
which is present in the liver has been proposed to be involved 
in membrane protein turnover (11), A membrane -bound tryp- 
sin-like protease has also been recognized in other cells such 
as neuroblastoma cells (12). More recently, a 170-kDa mem- 
brane-bound protease (gelatinase) has been implicated in 
melanoma cell invasiveness (13). As described in these reports 
the cell surface proteases are considered to play an important 
role(s) in cell growth, cell invasion of other tissues (such as 
in metastasis), angiogenesis, and tissue rearrangement, in 
addition to various other cellular processes. 

Hepsin is a putative serine protease of 417 amino acid 
residues originally identified from cDNA clones isolated from 
human liver cDNA libraries (14). In a previous study, a 
synthetic oligonucleotide probe for the amino acid sequence 
Met-Phe-Cys-Ala-Gly, which is conmion to many serine pro- 
teases, was successfully employed to isolate a number of 
known and novel proteases including hepsin. Hepsin contains 
a short hydrophobic amino acid sequence in the region near 
the amino terminus while its carboxyl-terminal half is a 
typical serine protease module. The hydrophobic sequence, 
composed of 27 amino acid residues, is very similar to the 
typical lipid bilayer membrane-spanning sequences found in 
many other membrane -associated proteins (14). In our pre- 
lim Lnary immunostaining study, hepsin was found to be pres- 
ent in cultured cells such as HepG2 and baby hamster kidney 
(BHK)' cells (15). It is highly likely that hepsin may have a 
role(s) similar to other cell membrane-bound proteases de- 
scribed above in cell growth and in other cell functions. 
Presently, however, the protein chemical and enzymatic prop- 
erties as well as the precise biological ro]e(s) of hepsin are not 
known. 

In this report, we describe evidence that demonstrates the 
actual existence of hepsin in cells. This includes determina- 
tion of the estimated molecular weight of cellular hepsin, its 
subcellular localization, topology at the cell surface, chromo- 
somal localization of its gene, as well as its tissue distribution 
of expression. 

EXPERIMENTAL PROCEDURES 

Materials — Keyhole limpet hemocyanin and bovine pancreatic 
trypsin were obtained from Sigma. Freund's adjuvant was purchased 
from Difco. Synthetic peptides were made by an automated peptide 
synthesizer (Applied Biosystems, model 438) employing solid-phase 
i-butoxycarboxyl chemistry. These peptides had free a-carboxyl 



* The abbreviations used are: BHK, baby hamster kidney; PBS, 
phosphate-buffered saline; SDS, sodium dodecyl sulfate; EGTA, (eth- 
ylenebi8(oxyethylenenitrilo)ltetraacetic acid; kb, kilobase. 
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groups. Activated CH-Sepharose 4B and PercoU were obtained from 
Pharmacia. Tissue culture supplies and proteinase K were purchased 
from Gibco/BRL (Life Technologies, Inc.). "C- Labeled size marker 
protein kits were obtained from Du Pont-New England Nuclear. All 
radioactive nucleotides were purchased from Amersham Corp. The 
protein assay kit as well as peroxidase-conjugated goat anti -rabbit 
IgG were obtained from Bio-RadL Adenosine 5' -phosphate and 4- 
ch!oro-l-naphthol were purchased from Sigma. Nylon membranes 
(GeneScreen Plus^) and the reticulocyte cell -free translation kit were 
from New England Bio Lab (Du Pont). 

Preparation of Antibodies — Five synthetic peptides (Pi, amino acid 
1-17; P2, 246-267; P3, 294-305; P4. 360-372; and P5, 398-417) 
corresponding to the amino acid sequence of hepsin predicted from 
the cDNA sequence (14) were employed to raise antibodies. Pi, PM 
(equimolar mixture of P2, P3, P4), and P5 correspond to the se- 
quences of the ami no- terminal region, the catalytic subunit, and the 
carboxyl- terminal region, respectively. Pi, PM, or P5 were separately 
coupled to the keyhole limpet hemocyanin by using glutaraldehyde 
as a coupling agent as described by Keichlin (16). Rabbits were 
immunized with a mixture of keyhole limpet hemocyanin -peptide 
conjugate with Freund*8 adjuvant as follows: 5 mg of the conjugate in 
complete Freund's adjuvant was injected subcutaneously on day 1, 
and 1 mg of conjugate in incomplete Freund's adjuvant (1:1) was 
injected on days 14, 21, and 28. After the third and fourth injection 
on days 14 and 28, animals were bled from the ear vein to test the 
titer. After the fifth week, blood samples were collected from the 
animals by heart puncture, and were then used to prepare affinity 
purified antibodies. 

Affinity purification of these antibodies was carried out as follows: 
peptide column was prepared by adding peptides (10 mg dissolved in 
20 ml of 0.1 M NaHCOa, pH 9.0) to the activated CH-Sepharose 4B 
(1 g dry weight) (Pharmacia) according to the manufacturer's instruc- 
tions. Antiserum (3 ml), which was incubated with 8 mg of hemocy- 
anin for 1 h at room temperature, was applied to the column (2.6 ml) 
followed by extensive washing with 10 mM sodium phosphate, pH 
7.4, containing 0.15 M NaCl (PBS). The bound immunoglobulins were 
then eluted with 0.1 m glycine-HCl buffer, pH 2.3, into 0.2 ml of 1 M 
Tris-HCl buffer, pH 7.0. The eluate was dialyzed against PBS and 
stored at —80 "C until use. Affinity purified antibodies prepared 
against peptides Pi, PM, and P5 were designated HAbPl, HAbPM, 
and HAbP5, respectively. Immunoblot tests showed that HAbPM 
and HAbPS were highly specific, while HAbPl was not, probably due 
to cross -reactivity with similar amino acid sequences apparently 
present in other proteins. 

Cell Culture — HepG2 cells and BHK cells were cultured in Eagle's 
minimum essential medium (Gibco) supplemented with streptomycin, 
penicillin, and 10% fetal calf serum in a 5% CO2 incubator at 37 °C. 

Fractionation of Cellular Components by PercoU Density Gradient 
Centrifugation — HepG2 cells (-^-O X 10^ cells) were harvested by 
scraping, washed twice with PBS (1000 rpm for 5 min at 4 "C), and 
resuspended in 3 ml of ice-cold STE solution (0.25 M sucrose, 10 mM 
Tris-HCl buffer, pH 7.5, containing 2 mM EGTA) followed by ho- 
mogenization with a Tekmar Ultra-Turrax tissue homogenizer for 15 
s. plasma membrane and mitochondrial fractions were isolated by 
the method of Belsham et al. (17) with minor modifications. Briefly, 
the homogenates were centrifuged at 100 x g for I rain. The pellets 
obtained were resuspended in 2 ml of STE solution, homogenized, 
and centrifuged. The two supernatants were combined and centri- 
fuged at 5000 X g for 15 min. A fraction (0,5 ml) of the pellet was 
suspended in 1.0 ml of STE solution, dispersed in 10 ml of iso-osraotic 
Percoll solution (7 volumes of PercoU, 1 volume of 2 M sucrose, 80 
mM Tris-HCl buffer, pH 7.5, containing 8 mM EGTA and 32 volumes 
of STE solution), and centrifuged for 20 min at 10,000 X g (Sorvall 
and RC6C with SS34 rotor). Two membrane bands, one immediately 
below the surface (plasma membrane) and the other close to the 
bottom (mitochondria) were separately collected into 4 volumes of 10 
mM Tris-HCl buffer, pH 7.5, containing 0.15 M NaCl. The two 
fractions collected were then centrifuged at 10,000 x g for 3 min to 
obtain membrane samples. The enrichment of the plasma membrane 
prepared was monitored by assaying a plasma membrane -associated 
lipoprotein, 5 '-nucleotidase, according to Windell and Unkeless (18). 
The purity of the membrane preparation was further tested by assay- 
ing activities of glucose O-phosphataae (microsome marker) and suc- 
cinate-cytochrome c reductase (mitochondria marker) according to 
Sottocasa et oL (19) with minor modifications. The microsome frac- 
tion used as a control in the assay was prepared as previously 
described (19, 20). 

An aliquot of the cell homogenates (above) was subjected to cen- 



trifugation at 100,000 X g for 30 min at 4 *C in a SW41.1 rotor 
(Beck man model L5-50 centrifuge). The supernatant collected was 
used as the cytosol fraction. The nuclear fraction was prepared from 
cell homogenates by sucrose density gradient centrifugation according 
to Blobel and Potter (21). 

Plasma membrane, mitochondria, and nuclear fractions were sol- 
ubilized with 0.2 ml of 10 mM TrU-HCI buffer, pH 7.5, containing 
0.15 M NaCI and 0.5% (w/v) Nonidet P-40 and used for immunoblot 
analysis. 

Immunoblot Analysis — Protein concentration of the samples was 
determined by the method of Bradford (22) with minor modifications. 
Proteins of solubilized plasma membranes, mitochondria, nuclei, 
cytosol, as well as culture media, were adjusted to a concentration of 
0.5 mg/ml with gel loading buffer (62.5 mM Tris-HCl, pH 6.8, con- 
taining 10% glycerol, and 2% SDS) and incubated at 4 'C for 12 h or 
at 95 "C for 3 min. An aliquot (7.5 fig of proteins) of the sample was 
subjected to SDS-polyacrylamide gel (12%) electrophoresis employing 
a Bio-Rad mini gel apparatus. The electrophoresed proteins were 
transferred to a nitrocellulose filter according to Towbin et al. (23). 
The blotted filter was blocked with 3% bovine serum albumin in 50 
mM Tris-HCl, pH 7.5, conUining 0.15 M NaCl (TBS) at 37 *C for 30 
min, followed by incubation at room temperature for 2 h with anti- 
bodies (P5) raised against the synthetic peptide containing the car- 
boxy l-terminal sequence of hepsin (500-fold dilution in TBS contain- 
ing 1% bovine serum albumin). The filter was washed 3 times with 
TBS containing 0.05% Tween 20 and incubated at room temp>erature 
for 2 h with horseradish peroxidase-conjugated goat anti-rabbit IgG 
which was diluted 1000-fold. The filters were then incubated with 
TBS containing 4-chIoro-l-naphtol (0.5 mg/ml) for 30 min at room 
temperature. 

Proteolysis of HepG2 Cells — Mild proteolysis of HepG2 cells to test 
the topology of hepsin at the cell surface was carried out as follows: 
HepG2 cells (about 90% confluency) in nine 10-cm culture dishes 
(total of about 4.5 x 10^ cells) were washed twice with phosphate- 
buffered saline (0.15 M NaCl, 8 mM Na2HP04, 0.6 mM KH2PO4), pH 
7.4, and incubated in the buffer for 30 min on ice with or without 10 
Mg/ml proteinase K or 100 Mg/ml bovine pancreatic trypsin. Under 
these conditions, HepG2 cells did not significantly lose their viability. 
Cells were then washed twice with the phosphate buffer and used for 
preparing plasma membrane proteins as described above. Aliquots 
(20 Mg each) of protein samples were subjected to immunoblot analysis 
as described above employing the affinity- purified antibody, P5. 

Fluorescent Immunostaining of Cultured Cells — Cells were main- 
tained at 37 "C in 5% CO:i in minimum essential medium containing 
10% fetal calf serum and antibiotics. Cells grown to subconfluency 
on coverslips (8 wells/slide; Miles Laboratories) were fixed at room 
temperature for 10 min with 2% paraformaldehyde and 0.2% glutar- 
aldehyde in PBS containing Ca^'*' and Mg^* (Gibco). After rinsing 
several times with PBS, cells were incubated with goat serum at a 
dilution of 1:20 in PBS at room temperature for 15 min to block 
nonspecific binding of the antibody. After several additional rinses 
with PBS, cells were incubated with purified antisynthetic peptide 
IgG (2-5 Mg/ml of PM which recognizes the middle portion of the 
putative catalytic subunit) in PBS containing bovine serum albumin 
(1 mg/ml) with and without 0.05% Triton X- 100 for 2 h in humidified 
Petri dishes. The bound IgG was visualized by incubating for 30 min 
with goat anti-rabbit IgG labeled with fluorescein isothiocyanate 
(diluted 1:50 with PBS). In control experiments: 1) the antibodies 
were preincubated with synthetic peptides (1 mg/ml of PM) used for 
raising antibodies before incubating with cells; or 2) PBS containing 
no antibodies with or without synthetic peptides (1 mg/ml) was added 
to cells; or 3) anti-hepsin antibodies were replaced with anti-human 
blood coagulation factor IX. For testing any intracellular immuno- 
staining, cells were treated with 0.5% Triton X-100 for 3-5 min before 
incubating with the antibodies (HAbPM). The cells were immediately 
examined by fluorescence microscopy and photographed. In this 
experiment, HAbPl antibodies (specific for the a mi no -terminal re- 
gion) were not employed because their specificity was found to be low 
in immunoblot analysis and they recognized not only the 51-kDa 
band but also a significant number of other bands. 

RNA Blot Analysis — Total RNAs of various baboon tissues were 
prepared by the guanidinium isothiocyanate method described by 
Chomczynski and Sacchi (24). RNA preparations (20 Mg for each 
tissue) were electrophoresed in a 1,5% agarose gel containing 6.7% 
formaldehyde in 20 mM phosphate buffer, pH 7.0 (25). The agarose 
gels were then blotted onto GeneScreen Plus^ membranes (Du Pont/ 
New England Nuclear), followed by baking for 2 h at 80 'C. A hepsin 
cDNA (1.8 kb) (14) was labeled with [a-='''P]dCTP by using an 



16950 



Characterization of Heps in 



oiigolabeling kit (Phariiincin) to a specific activity of about 1 x 10'^ 
cpm/^ig. Frehybriclizniion, hybridization with the radiolaljcled cDNA 
protie. and washiiifj; were carried out as described by the manufacturer 
for the GeneScreen Plus^^ membrane. The membrane was then ex- 
posed to x-ray film (Kodak X-Omat AH) at — 70 *C. A ribosoma! 
KNA tjenu probti was used to confirm the presence of RNAs in each 
Inne of the blot. 

Molecular Mapping of the Getw Jajcuh — A panel of somatic cell 
hybrids for mappinj; was established by PEG lOOO-mediaied cell 
fusion of human VA2, Ari-19. IMHlH) fibroblast or peripheral human 
lymphocyte cells lo either Chinese E'Sfi or Syrian HHK-Bl hamster 
cells as previously described (261. A panel of hybrids for mapping! was 
established after charncterixation for their human chromosome con- 
tent by screening up to 'M ^jene enzyme systems and, in selected 
cases, by cytogenetic analyses. ^"^P- Labeled hcpsin cONA (1.8 kb) ( 1- 
3 X 10^* dpm/;iB) was hybridized to DNA blots of these hybrids and 
controls which had been digested to completion with HindlW, /iamHI, 
or EcoHl, electrophoresed. and blotted as described (26). 

In situ chromosomal hybridization was carried out as follows: 
human metaphase cells were prepared from phytohemagglutinin- 
stimulated peripheral blooti lymphocytes (27). A radiolal»eled. hepsin- 
specific cDNA probe was prepared by nick translation of the entire 
plasmid with all four dcoxynucleoside triphosphates '^H-lahclcd to a 
specific activity of 1-2 X 10** dpm/pg. In situ hybridization was 
performed as described previously (27). Metaphase cells were hyr>rid- 
ized at 2.0 and 1.0 ug of probe/ml of hybridizHtion rnixuiro. Aiitora- 
diofjraphs were exposed for 1 1 days. 

Cell' free Tranficriptitm nf Hfpsin < DNA and in Vitro Translation-- 
l iepsin cDNA (l.H kb) (]4) was inserted into the pSGo vector (Sira- 
tagene) for both orientations at the unique /%coHI site downstream of 
the 7'" promoter. The chimeric plasmid was then transfected into 
Escherichia coii TB-\ cells antl amplified followed by preparation 
employing the alkaline-SDS method and CsCl gradient- uhracentrif- 
ugation. The plasmicLs were linearized by digestion with Xha] located 
downstream of the insert in the vector sequence, followed by incu- 
bation with proteinase K (50 ;ig/ml) at 37 *C for 30 min. The reaction 
mixture was extract ed twice with phenol/chloroform (1:1) and ethanol 
precipitated prior to subjecting it to transcription reactions. The 
linearized plasmid DNAs were dissolved in TE buffer (10 mM Tris- 
HCI, pH 7.4, 0.1 mM EiyPA prepared with diethyl* pyrocnrlx>nate' 
treated water) and employed as a template for transcription reactions. 
Cell-free tran.script ion was carried out at 37 'C for 30 min with T7 
HNA polymerase using an mRNA capping kit (Stratagene) according 
to the manufacturer's instnjctions. The trans<-ript ion reaction mix- 
ture was then added to 2.") unit.^ of KN'ase free-ONase I (Vtllowed by 
an additional incubation for 5 min at 37 *('. Synrhesize<i KNA was 
precipitated with ethanol after extracting once with phenol/chloro- 
form (1:1), diss<ilved in TE buffer, and employed in translation 
reactions. The I-tNAs synthesized were quantitated by reading the 
absorbance at 2f)0 urn. The size of the FiNA was determined by 
formaldehyde -agarose gel electrophoresis. Generally, a bom. 40-*I5 /ig 
of RNA (13 kb) were obtained from 2-5 of DNA template. 

The prepared hepsin RNA (1-2 ng) was then subjected to transla- 
tion at 30 'C in the presence of (^"^Slmcthionino by employing the 
rabbit reticulocyte lysate system (New England Biolab) according to 
the manufacturer's instructions. An aliquot (o pi) of the translation 
reaction mixture (25 ^\) was mixed with the loading buffer, treated 
in boiling water for 5 min. and subjected to SDS-polyacrylamide gel 
(15%) electrophoresis. After electrophoresis, the polyacrylamide gel 
was treated with Amplify (Amershan)) for 15 min according to the 
manufacturer's instructions to enhance the radioactivity signals, 
dried, and exposed to x-ray film at —70 'C. 

KKSULTS 

Subcellular i^tcalization of Hepftin — Ininiunobloi analysis of 
HepG2 as well as BHK cells is shown in Fig, 1, Bnscd on the 
5 '-nucleotidase activity Assayed, the plasma membrane prep- 
aration used in this experiment was found to be enriched 18- 
fold over the crude cell membrane starting material. The 
membrane preparation was highly pure with little contami- 
nation by microsomes and mitochondria, as monitored by 
glucose 6-phosphatase and succinate-cjlochrome c reductase 
(<0.2% and 0,5% contamination, respectively). Protein bands 
of 51 and 28 kDa were ob.servecl at high concentration levels 
in the extracts of cell tncmbrane fractions prepared from 
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KtG. 1. Immunobtot analysis of ncpG2 nnd BHK cells. Ex- 
perimental details are described under ""Experimental Pr<:H:edures." 
Aliquots (7.5 pg) of proteins of various cell sul)Components and media 
are loaded for each lane. iMne /, BHK cell membranes; l^ne 2» 
HcpG2 cell membranes; Lone .V, MepG2 c>tosol; Ixme HepG2 
media. The numbers on the left show the i>ositions of size markers. 
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Fig, 2. Ceil free translation assay of hepsin cDNA, Lane L 
'*C-labeled size marker proteins (from the top: myosin, y-globutins, 
phosphorylase 6, bovine serum albumin, ovalbumin, carlw)nic anhy- 
drase, lactoglobulin, cytochrome c. respect tvfily); Ume 2, no UNA 
added; Iauu:s 3 and ^t, 0A2 and 1.7 /ig in vitnt transcripts (sense 
strand) were added. resp<*ct ively; ixuifs 5 and Ci. 0,*12 and 1.7 uil of in 
vttrt) transcripts fantisLMise strand) ndded. respecitvoly: /^m*; 7, I,H 
/i^ of pS05 (no hepsin ir^sert) transcrib<*d KN.A: /^n*- S. 2 //ji of 
human placenta liNAs. The numbers on the li.'ft indtraie the [M>sit.ions 
and sizes of relevant size marker proteins. 

HepG2 ceils, while BHK cells showed only the major band 
(51 kDa). These bands were competed out with the addition 
of P5 (synthetic peptide of the carboxyl -terminal region of 
hepsin) which was used to raise the antibodies employed in 
the immunoblot analysis. These bands were also present at 
reduced levels in nuclear and mitochondrial fractions (data 
not shown), but neither in the cytosol nor in ctdture media. 
The presence of hepsin in nuclei and mitochondria may be 
due in part to possible cell membrane contamination in these 
fractions. These restilts indicate that hepsin is a protein 
primarily associated with the plasma n»einhrane. 

Coil-free Translation Analysis — When :>i uitro transcripts 
of hepsin cDN.A were employed in cell -free translation assays, 
a specific polypeptide band of 44 kDa was observed in SDS- 
polyacrylamidc electrophoresis (Fig. 2). The estimated size of 
the band agreed reasonably well with that expected from the 
cDNA (14). The larger molecular size observed in immunoblot 
analyses of all extracts may be due to the lack of potential 
post-translational modifications such as glycosylation. A pos- 
sible site for the A^-linked carbohydrate chain attachment is 
at amino acid 112 of the hepsin molecule. Hepsin may also 
contain O- linked carbohydrate chains. 

Tissue Distribution of Hepsin Ocne Expression — The tissue 
distribution of hepsin expression was analyzed by RNA blot 
analysis of total UNA samples prepared from a young ndvilt 
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baboon tissue including the h>pothalnmus, small intestine, 
pancreas, testis, salivary gland, skeletal muscle, lung, adrenal 
gland, thyroid, pituitary gland, liver, spleen, kidney, brain, 
and thymus (P'ig. 3). The results showed that the mRNA for 
hepsin was 1 kb in size, and was ftnind at the highest level 
in the liver. U was also pre.stMit in ot her tissues. all)cit at nuich 
lower levels, including the kidney, pancreas, lung, thyroid, 
pituitary gland, as well as the testis. Extremely low levels of 
the niRNA were found in the thynius, .spleen, small intestine, 
and in the adrenal ghind. These results indicate that hepsin 
is ubiquitously expressed in various tissues with preferred 
tissue specificity for liver. 

Chromosomal Uicalization of the Hepsin Gene — To obtain a 
chromosome assignment for the hepsin gene, a hepsin cDNA 
probe was hybridized to Southern blots of a i>anel of somatic 
cell hybrids. The results showed perfect concordance between 
human chromo.some 19 and hepsin (Table I). A significant 
discordance was observed between hepsin and all of the other 
hutTian chromosomes (27-59%), 

To determine the chromf)somal localization of the hepsin 
gene using an independent method and to sublocali/e this 
gene, we hybridi/.ed a hejisin-speciHc probe (cDNA) to normal 
metaphase chroinosrui^es. This resulted in specific labeling 
only of chromosome Hi. Of 100 metaphase cells examined 
from this hybridization, 39 were labeled on region ijl of one 
or both chromosome 19 homologues. The distribution of la- 
beled sites on this chromosome is illustrated in Fig. 4. Of 224 
total labeled sites observed, 64 (28.6%) were located on chro- 
mosome 19. These sites were clustered at bands ql 1-13.2 and 
this cluster represented 21.9% (49/224) of all labeled sites 
(cumulative probability for the Poisson distribution is 
<c0.0005). The largest number of grains was observed at 
1.9ql3,!. Similar results were obtained in three additional 
hybridization experiments using this prol>e. Thus, the hepsin 
gene is localized to chromosome 19, at bai^ds ql 1-13.2, 

Immanofluores'cent Stainin/^ of Cul lured Cells — Cultured 
cells including Hep(.i2 and BHK cells were immunostained 
for hepsin with antibodies (HAbPiVl) raised against the syn- 
thetic peptides (PM. an eciui molar mixture of Pi, P2, and P3) 
designed to the catalytic subunit of hepsin (Kig. M). The 
antibodies employed uniformly stained HepG2 cells. BHK 
cells were also stained, but at reduced intensity. The staining 
was completely competed out when synthetic peptides u.sed 

1 a 3 4 5 6 7 8 9 10 11 12 13 M 15 




FlO. 3. UNA blot analysis of young adult baboon tissue. Each 
Inno conrainod 20 fi^ of total RNAs isolntcd from a younji adult 
baboon. Ijinatt 1-15 contain h\pothalamus. snioll in test inc. pancreas, 
lost is. salivary (^land. skolcial niusclo. luny, adrenal f^tand. t.hyroid, 
pitiiit/iry j^Wmd. liver, spleen, kidnoy. brain, and thymus, respoirlively. 
The sl'/e and positions of RN.As are shown ai tho right.. A h<ipsin 
cDNA ( I.B kbl wns used ns the radiolalieted probe in this experiment. 



Table I 

SynU'ny tPst of th*' hcfXsin fierw and human chrnmo^onx(*s 
in rocirnt hunuin hybrid clnnf"; 
.*^(tinai ir ccW hybrids werv* smrod for (hf* presence (-t k or <il>senfe 
i — ) orsperifir h\nnnn rlir»»Mi*i>«Mnf-s liv j;ene en/Anir and < vin;;cMeiir 
juudysos (infi for ttie pre.'-ene*' ttr absence f*t licpsin ctxlinu ^^t-ijoent 
bv Southern bhu hvlirifliy.aiitin. 
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Kir:. 1. Distribution of labeled sili^s on chromosome ll> in 
100 normal human metaphase cells from phytohcniajiuluti- 
nin-sltmulaled peripheral blood lymphocytes that wcr*; hy- 
bridised with the hepsin probe. Of In be led sites <>bservi?d on 
chromosome 19, 49 (76.6%) were clustered m. I9ql l-I.'^^.2; the iarfjcst 
cluster of grains was located at 19ql3.1. 

for raising nntibodie.s were preincubntecl with antibodies, in- 
dicating that the staining of the cells is specific (Fig- oH), 
Antibodies raised aj^ainst the synthetic peptide P5 (the car- 
boxyl-terininal region) gavx similar results (data not shown). 
Perineabilized cells with IViton X-lOO did not show any 
significant increase or change in staining (data not shown). 
When antibodies specific for blood coagidation factor IX were 
used or a nti -hepsin antibodies were omitted in control exper- 
iment's, no significant staining of the cells was observed. These 
imnumostaining patterns show that hepsin primarily has its 
catalytic suhunii {carboxyl-half) ihe cell surface. Conse- 
quently, its ami no- terminal portion is likely to l)e facing the 
cytosoi. The tmmunostaining results of cultured cells as well 
as tissues are consistent with this molecular orientation of 
hepsin at the cell membrane. These results also agree well 
with those of the immunoblot analysis which showed hepsin 
to be primarily located in the cell membrane fraction. The 
HAbPl antibody which was raised against the NH^-terniinal 
region of hepsin did not serve to further confirm the results 
because of its unfortAinate low specificity. 

Mitd Proteolysis of HepG2 Cells — To further test the topol- 
og>' of hepsin, HepG2 cells were mildly digested with tr>TJsin 
(100 Mg/ml) or proteinase K (10 ;jg/ml) on ice. The results of 
immunoblot analyses of these protein samples are shown in 
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FiC. 5. Fluorescent immiinostainin|{ of HepGii colls. HinH 
A. staining cells with .'intihodtes niiseri figainst the t^atfilyiic riomaiii 
( f-IAhr*M), lUmel staining cells in the presence ot" antigen peptides. 
Kxperimcninl derails are described under **Experinicninl Procedures." 




Flc. 6. Immunohtot anulysiN of plasma- membrane proteins 
prepared from MepC*2 cells with and without mild proteolysis. 

ijcmr Micmhrnne proteins f20 of HepG2 celln treated with 
proteinase K iW) ^\i^Ji\\\ )\ Ixim' 2, niemhrane proteins (20 pj:) of HepM2 
cells ireatod with iryi)sin (100 mi/mV); Uine :i. membrane proteins 
(20 mk) of IIepG2 cells without, protease treat,tneni. Bands a and v 
correspond to the ril- and '2H-kDa hepsin bands in Kig, 1. Band b 
corresponds lo partinlly det^raded hepsin. Antilwdies prepared n^ainst, 
the carboxyl-tcrminnl region (llAbP5) were used in this experimenl. 

Fig. 6. The protein bands (a and c in control lane 3) corre- 
spond to 51 and 28-kDa bands of hepsin in Fig. 1 . When the 
cells were treated with tr>'psin {lane 2), both bands a and c 
were grossly reduced in intensity compared to the nontreated 
control (lane 3). When the cells were very mildly treated with 
proteinase K (1.0 nt^/ml, lane /), both band.s a and c lowered 
their intensities and a new band, 6, appeared, likely derived 
from band a. These results suggest, that limited proteolysis, 
which is mild enough lo maintain cellular integrity and via- 
bility, results in significant degradation of the carboxyl-ter- 
minal portion (the catalytic subunit) of hepsin. This further 
supports the molecular orientation of hepsin with its catalytic 
subunit at the cell surface exposed to the extracellular space. 

DISCUSSION 

The results of our studies demonstrate that hepsin, origi- 
nally identified as a putative membrane-bound protease, is 
present in the cell membranes. We have also characterized its 
molecular size, tissue distribution of expression, and the chro- 
mosomal localization of its gene. 

The size of the mRNA for baboon hepsin is estimated to be 
about 1.85 kb. The human hepsin mUNA produced in HepG2 
has a similar size and agrees well with that predicted from 
the cDNA. The hepsin gene is located at \9i\ \ \ -\:\.2. The 



molecular mapping results and Southern blot analysis of 
human genomic DNA suggest that hepsin has a single copy 
gene.' 

Antibodies raised against synthetic peptides designed to 
various parts of the hepsin sequence predicted from the cDNA 
were successfully used to characterize and analyze its expres- 
sion. Immunoblot analysis of membrane proteins of HepG2 
cells showed two polypeptide bands of 51 (major) and 28 kDa 
(minor) (Fig. 1), whereas BHK cells had only the major band 
(51 kDa). This major band agrees well with the molecular 
sizes predicted from the cDNA and the cell-free translation 
experiment. The smaller minor band of 28 kDa is considered 
to be a degradation product derived from the putative catalytic 
subunit portion of the 51-kDa species. In the reduced condi- 
tion, the apparent size of the 51-kDa band increased slightly 
indicating that this band represents a single pol>^>eptide chain 
which has not undergone any degradation during the mem- 
brane protein extraction procedures employed. We speculate 
that proteol>i:ically activated hepsin, which may be composed 
of two subunits (162 and 255 amino acid residues) linked 
together with a disulfide hon<\ (14), may be efficiently cleared 
from the cell membrane, since we have not seen any signifi- 
cant generation of the expected subunits in the gel electro- 
phoretic analysis emplt>ying reduced conditions. This may 
take place by binding to a specific inhibitor(s) or by acceler- 
ated degradation due to an unknown mechanism. In uitro 
translation assays of the RNA transcripts of hepsin cDNA 
showed a distinct specific band of about 44 kDa that agrees 
reasonably well with the size predicted from the cDNA .se- 
quence. This size also agrees well with that observed for 
cultured cells if we take into account the potential post- 
translational modifications such as glycosylation which may 
increase the molecular mass to the apparent 51 kDa. A poten- 
tial site for the A/-linked carbohydrate chain attachment is 
located at amino acid 112. At the present time, we do not 
know whether or not. this site is glycosylated, or whether any 
O-l inked carbohydrate chains are attached to the mature 
hepsin molecule. 

As shown in HNA blot analysis of baboon tissues (Kig. 3), 
hepsin appears to be ubiquitously expressed in various tissues, 
particularly in the liver, at a high level. Tlie expression of 
hepsin in various tissues suggests that this protease may be 
involved in an essential biological proce.ss(es) in many differ- 
ent cells. In HepG2 cells, hepsin is present in the cell mem- 
brane fraction at high levels^ but not in the cytosol or in 
culture media (Fig. 1). Nuclear and mitochondrial fractions 
also contained a lower amount of hepsin of the same molecular 
weights (data not shown). The results of fiuorescent immuno- 
staining experiments show that hepsin is primarily a cell 
membrane-associated protea.se with the molecular orientation 
of its catalytic subunit (the carboxyl -terminal half) at the cell 
surface. The i>atterns of the fiuorescent immtinostaining of 
various tissues is consistent with this molecular orientation. 
The ob.servation thtit mild protease t reatment of intact HepG2 
cells greatly decreases the intensity of hep.sin bands as tested 
by immunoblot analysis (Fig. 6) further supports the molec- 
ular orientation. When the sequence of 15 amino acid residues 
which immediately Hank the hydrophobic sequence of hepsin 
were compared, the NH:;-terminaI side flanking sequence 
contained the 4 positive net charges while the COOH-terminal 
Hanking side contained no net charges. This agrees well with 
the consensus topological sequence for the type 11 membrane 
proteins derived from well-defined membrane-spanning pro- 
teins (28-30). Furthermore, the immediately flanking residue 
of the NHj-terminal side of the hydrophobic sequence is a 
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positively charged residue, lysine, agreeing well with the con- 
sensus sequence for topology of the type 11 membrane proteins 
recently proposed by Parks and Lamb (31). These observa- 
tions support the premise that the mechanism of intracellular 
transportation of the newly synthesized hepsin is analogous 
to that of other reported membrane-bound proteins. 

Several proteases with a similar cellular localization and 
orientation have been reported (8, 11, 13). Hepsin, however, 
is novel and distinct from each of these proteases reported to 
date. 

Proteases have been shown to be present during cell migra- 
tion (32) and tissue rearrangement (33) involved in morpho- 
genesis, where it has been assumed that they create space for 
cell migration and process extension through an extracellular 
matrix and cell-filled milieu. Their role in cell growth can be 
inferred from their presence, for example, on immature but 
not mature glial cells (34) or the highly developmentally 
regulated appearance of tissue plasminogen activator in ma- 
turing sperm (35). Although the precise biological role(B) of 
hepsin is unknown at the present time, we postulate that 
hepsin also plays an important role(s) in cell growth, probably 
by creating space for growing cells by degrading a specific 
extracellular matrix protein(s) or a protein(s) in the tissue. 
In this regard, it is important to note our recent observation 
that hepsin is expressed at a greatly elevated level in actively 
dividing cells in such tissues as the basal layer of the epidermis 
of developing skin.^ Hepsin may also have a role in other cell 
functions in normal as well as in pathological conditions. In 
our preliminary results, antisense oligonucleotides of hepsin 
show a significant effect on the growth rate as well as on the 
morphology of HepG2 and BHK ceUs in culture, supporting 
the above hypothesis.^ Hepsin may also play an important 
role in the metastasis of tumor tissues like some other mem- 
brane proteases (13); however, this has yet to be tested. 

Determination of the substrate specificity of hepsin is ob- 
viously very important in order to define its precise biological 
role(s). In our preliminary assay, hepsin highly enriched on 
the antibody affinity column showed strong activity towards 
JV-benzoyl-Leu-Ser-Arg-pNA-HCl, but it did not cleave N- 
benzoyl-Glu-Phe-Ser- Arg-pNA • HCl. To this end, efforts to 
isolate hepsin in quantity from cultured cells and tissues is in 
progress. Determination of its concentration in variovis tumor 
tissues is also in progress in our laboratory. 
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Previous studies have suggested the existence of a 
membrane-associated serine protease expressed by 
mammalian preimplantation embryos. In this study, we 
have identified hepsin, a type II transmembrane serine 
protease, in early mouse blastocysts. Mouse hepsin was 
highly homologous to the previously identified human 
and rat cDNAs, Two isoforms, differing in their cytoplas- 
mic domains, were detected. The tissue distribution of 
mouse hepsin was similar to that seen in humans, with 
prominent expression in liver and kidney. In mouse em- 
bryos, hepsin expression was observed in the two-cell 
stage, reached a maximal level at the early blastocyst 
stage, and decreased subsequent to blastocyst hatching. 
Expression of a soluble form of hepsin revealed its abil- 
ity to autoactivate in a concentration-dependent man- 
ner. Catalytically inactive soluble hepsin was unable to 
autoactivate. These results suggest that hepsin may be 
the first serine protease expressed during mammalian 
development, making its ability to autoactivate critical 
to its function. 



Embryonic development is marked by a series of cellular 
divisions and morphogenetic changes (1). These processes are 
mediated by the complex expression and interplay of different 
sets of genes, some of which are derived from maternally ex- 
pressed genes stored as mRNAs in the oocj^es. It is generally 
accepted that zygotic gene expression begins at the embryonic 
two-cell stage (2). These newly expressed zygotic genes comple- 
ment the maternally expressed genes to mediate early preim- 
plantation development. Numerous studies have suggested the 
involvement of a variety of proteases during development. 
Members of the astacin family of metalloproteases are involved 
in hatching in both invertebrates and vertebrates (3-6), pat- 
tern and tentacle cell formation in the hydra by HMPl (7), 
neuroblast migration in Caenorhabditis elegans by hch-1 (3), 
dorsal/ventral patterning in Drosophila by Tolloid (8), and bio- 
mineralization and bone/cartilage formation in mammals by 
BMP-1 (9, 10). Interestingly, both Tolloid and BMP-1 can phys- 
ically interact with transforming growth factor-/3 (8, 9), and 
this association is essential for normal development, perhaps to 
activate latent transforming growth factor-^ complexes. In ad- 
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dition, BMP-1 has been shown to be the procollagen C-endopep- 
tidase (EC 3.4.24.19) required for the processing of type I, II, 
and III procollagen to fibrillar coUagens to yield the major 
fibrous components of vertebrate extracellular matrix (11, 12). 

Proteases have also been shown to play essential roles in cell 
differentiation. Recently, new members of the adamalysin/re- 
prolysin metalloprotease have been described and were shown 
to have a direct role in a number of developmental processes. 
Fertilin-« and -)3, the first members of this family, have been 
shown to have essential roles in sperm -egg fusion during fer- 
tilization (13-15). The recent discovery of meltrin-a, a fertilin- 
related member of the adamalysin/reprolysin metalloproteases 
important for myoblast fusion during skeletal muscle develop- 
ment, suggests that there may be a common mechanism in 
gamete and myoblast fusion (16). Astacin-like proteases of the 
Tolloid/BMP-1 family play important roles in cell differentia- 
tion and morphogenesis in animal embryos ranging from the 
hydra and sea urchins to mammals (17). 

Serine proteases have also been implicated in development, 
which is exemplified by genetic studies of the products of the 
Drosophila gene stubhle-stubbloid^ which is essential for epi- 
thelial morphogenesis of imaginal discs of Drosophila (18). 
Mutations in this gene affects imaginal disc formation and 
affect the organization of microfilament bundles, leading up to 
defects in bristle, leg, and wing morphogenesis. Also in Dro- 
sophila^ the maternally transcribed product of the caster gene, 
a trypsin-like serine protease, is essential for the establish- 
ment of a normal dorsal-ventral pattern in the embryos (19). Of 
note, perturbing quantitatively the level of Easter protease 
activity in Drosophila as a result of dominant mutations can 
disrupt the dorsal-ventral axis, leading to ventralizing and 
lateralizing phenotypes (20). The Drosophila trypsin-like en- 
zymes easter and snake are part of a cascade of zymogen 
activation leading up to the conversion of the ligand-precursor, 
spatzle to its active form (21—23), Active spatzle then activates 
its receptor Toll to affect specification of dorsal and lateral cell 
fates (24, 25). 

While evidence exists that one or more serine proteases exist 
in mammalian preimplantation embryos, the identity of these 
enzymes has remained elusive. One of the earliest events in 
embryogenesis thought to require a protease is blastocyst 
hatching. This involves the proteolysis of the zona pellucida, an 
event critical for subsequent uterine implantation of the em- 
bryo. Studies have suggested that a single, membrane-associ- 
ated serine protease is expressed by hatching blastocysts (26). 
In this study, we identify hepsin, a serine protease containing 
a transmembrane domain, as a serine protease expressed by 
mouse embryos at the two-cell stage through the early blasto- 
cyst stage. In addition, we demonstrate that a soluble form of 
hepsin lacking the transmembrane domain undergoes autoac- 
tivation, suggesting a mechanism by which hepsin becomes 
proteolytically activated in the absence of other proteases. 
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14 59 CGCCTCATCTCGCTGCTCCGTGCTGCACTAGCATCCAGAGTCAGAGTTGGTCTGGTGGCTCCAGCCCCACGTGGTAGGCTCCACACT 

1546 GGGCCTCACATGCAATGGTTTCCTGCTCAGATCCAGTCCACGGGTCCAAGGATGCTGGATCCAA6GACTTCTCTTCCA 

1633 GCCCACTCAATCCCAGGGCCATTGGCCTCACCCTCCCACCCCATGTAAATATTACTCTGTCCTCTGGGGGGCGCTCTAGGGAGCCCC 

1720 TTGTOC AG ATG CTCTTTAAATAATAAAGGTGGTTTTG ATTAATGGG ACAAAAAAAAAAAAAA 



Fig. 1. Nucleotide and predicted amino acid sequences of mouse hepsin. The internal signal peptide sequence serving as a transmem- 
brane iTM) domain is under Lined. The zymogen activation cleavage site (aiTow), catalytic triad residues {asterisks), and Asp^"*^ (circle) are depicted. 



EXPERIMENTAL PROCEDURES 

Collection and Culture of Mouse Preimplantation Embryos — Experi- 
ments utilizing preimplantation embryos were performed with cultured 
two-cell stage embi*yos, which were obtained from B6C3F1 prepubes- 
cent female mice (Charles Rivers Lab) weighing 10-13 g. Mice were 
injected intraperitoneally with 5 lU of pregnant mare's serum gonado- 
tropin (Sigma) followed 48 h later with 5 lU of human chorionic gona- 
dotropin (Sigma). Subsequently, a single female was paired with a 
single male overnight, and females were checked for vaginal plugs the 
following day (day 1). On day 2, mice were dissected to obtain the 
oviducts, which were bathed in sperm washing medium (Irvine Scien- 
tific) and dissected to release the two-cell embryos. About 40-50 two- 
cell embryos were pooled and cultured under oil at 37 °C in a humidified 
atmosphere of 5% CO^ in air in 50-ftl droplets of human tubular fluid 
(Irvine Scientific) plus 0.5% human serum albumin (Ii*vine Scientific). 
Cultures were maintained for 4-5 days or until expanded blastocysts 
began to hatch. 

RNA Isolation and First-strand cDNA Synthesis — Total RNA was 
isolated fi:"om 100-200 hatching blastocysts (embryonic day 4.5), accord- 
ing to the method of Chomczynski and Sacchi (27). The total amount of 
RNA obtained was then used in the first-strand cDNA synthesis reac- 
tion using Superscript reverse transcriptase (Life Technologies, Inc.) 
and oligo(dT) as primers. The reaction was incubated at 42 for 1 h. 
Subsequently, RNase H (Life Technologies, Inc.) was added and the 
reaction was incubated at 37 *C for 20 min to remove the RNA 
template. 

PCR^ Amplification, Cloning, and Sequencing of Mouse Hepsin — To 



^ The abbreviations used are: PGR, polymerase chain reaction; bp, 
base pair(s); kb, kilobase pair(s); RT, reverse transcriptase; PAGE, 
polymerase chain reaction; fVni, factor VII; fVlIa, activated factor VII; 
pBS, pBluescript. 



identify the serine protease involved in mouse blastocyst hatching, 
degenerate oligonucleotides, 5'-TGCTCTAGATGG<A/G)TINTI(A/T)(G/ 
C)IGCIGCICA-3' and 5'-CCGGAATTCA(A/G)IGGI(G/C)(ACT)ICCI(G/ 
C)(AyT)(A/G)TCICC-3' (Molecular Biology Resource Facility, OUHSC), 
based on two conserved regions of known serine proteases, were used to 
amplify a 500-bp DNA fragment, encoding part of the protease catalytic 
domain, from hatching blastocyst RNA. Aliquots of first-strand cDNA 
were incubated in the presence of 0.1 /am of each 5'- and 3 '-primers, 100 
fxM dNTP, 1 X PGR buffer, and 2.5 units/100 fil of AmpliTaq DNA 
polymerase (Perkin-Elmer). The reactions were cycled 40 times through 
the following steps: 30 s at 94 *C, 30 s at 55 'C, and 1 min at 72 *C in 
a Perkin-Elmer DNA thermocycler model 2800. DNA fragments of the 
correct size ('-500 bp) were purified from agarose gels using GeneClean 
II (BIO 101 Inc., Vista, CA). The purified fragments were ligated into 
pBS-SK-i- (Stratagene) using T4 DNA ligase (New England Biolabs). 
Double-stranded DNA was sequenced using T3 and T7 primers and the 
Sequenase Version 1 kit (U. S. Biochemical Corp./Amersham Life Sci- 
ence). Sequences of cloned PCR fragments were compared with DNA 
sequences compiled in data bases. 

A full-length cDNA of mouse hepsin was subsequently cloned by 
screening a mouse liver cDNA library (Stratagene), using the manufac- 
turer's instruction. ''^P-Labeled DNA probes were generated using the 
Prime-It II random primer labeling kit (Stratagene) and the 500-bp 
cloned PCR fragment described above as a template. A 1.8-kb cDNA 
obtained was sequenced as described above using both pBluescript and 
internal primers. 

Construction and Expression of Soluble Hepsin and Catalytically 
Inactive Hepsin — The method of site-directed mutagenesis as described 
previously (28, 29) was used to introduce a Stul restriction site at the 
end of the coding sequence of the transmembrane domain of hepsin 
using the oligonucleotide, 5'-GTGACCATCCTAAG<jCCTAGTGAC- 
CAGGAGCC-3', which replaced nucleotides 331-336 with a Stul site. 



Hepsin in Preimplantation Embryos 



31317 



Mouse 

Rat 

Human 

Mouse 

Rat 

Human 



MAKEGGRTAACCSR PK 
AP 

MAQ VP 



VAALIVGTUjFLTGIGAASWAIVTILL 

TV F — G IL- 

TA L — A AV- 



QSDQE PLYQVQLS PC 

R Q__L_pG 

R p — v-SA 



DSRIAVLDKTEGTWRLLCSSRSNARVAGLGCEEMGFLRALAHSELDVRTAGANGTSGFFC 

-S — L-L -G A 

-A— M-F S T 



58 
58 
59 

118 
118 
119 



Mouse 

Rat 

Human 



VDEGGLPLAQRLLDVI SVCDCPRGRFLTATCQDCGRRKLPVDRI VGGQDSSLGRWPWQVS 178 

G — LA D T-T Q-S 178 

R — HT E A-I R-T 179 



Mouse 

Rat 

Human 



LRYDGTHLCGGSLLSGIHWLTAAHCFPERNRVIiSRWRVFAGAVARTS PHAVQLGVQAV:! Y 238 

T RT AV 1- 238 

A QA GL V- 239 



Mouse 

Rat 

Human 

Mouse 

Rat 

Human 



HGGYLPFRDPTIDENSNDIALVHLSSSLPLTEYIQPVCLPAAGQALVDGKVCTVTGWGNT 

TID S V 

NSE P 1 



QF YGQQAMVLQEARVPII SNEVCNS PDFYGNQ I KPKMFCAGY PEGGI 
_y G D GA 



ACQGDSGCPF 

H- 

P- 



298 
298 
299 

356 
356 
3 57 



Mouse 

Rat 

Human 



VCEDSISGTSRWRLCGIVSWGTGCALARKPGVYTKVTDFREWIFKAIKTHSEASGMVTQP* 417 
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Fig, 2. Sequence alignment of mouse, rat, and human hepsin. Deduced amino acid sequences of mouse, rat, and human hepsin are shown. 
Amino acid identity is indicated by a dash. The conserved TM domain and Asp^**® are boxed. 



This Stul site and the Xbal site at the 3' end of the cDNA in pBS-SK+ 
were used to excise a 1.1-kb DNA fragment and cloned into the same 
sites in the RSV-PL4 expression vector (30). This construct included a 
transferrin signal peptide, followed by an amino-terminal epitope tag 
recognized by HPC4, a calcium-dependent monoclonal antibody (31). 
The soluble hepsin expressed using this vector had a new amino- 
terminal of Glu-Asp-Gln-Val-Asp PVo-Arg-Leu-Ile-Asp-Gly-Lys-Ile-Glu- 
Gly-Ser-Pro, followed by the wild-type hepsin sequence from Ser*''. The 
non-functional S348A soluble hepsin mutant, which replaced the active 
site serine with an alanine, was constructed similarly with the addi- 
tional use of the oligonucleotide, 5'-TGCCAGGGCGACGCTGGGGGC- 
CCCTTTGTG-3'. The resulting constructs were transfected into human 
293 epithelial cells using LipofectAMINE (Life Technologies, Inc.) as 
suggested by the manufacturer. High expressing clones were selected 
using 400 fxg/ml G418 (Life Technologies, Inc.). The accuracy of the 
constructs were confirmed by DNA sequencing. The recombinant 
epitope- tagged protein was purified from conditioned medium by affin- 
ity chromatography using HPC4-linked Affi-Gel 10 and was eluted with 
EDTA. 

Assay of Soluble Hepsin Activity — Soluble hepsin amidolytic activity 
was assayed using the chromogenic substrate Spectrozyme PCa (H-d- 
ICbol-Lys-FVo-Arg-pNA; American Diagnostica) at a final concentration 
of 0.2 niM. The absorbance at 405 nm was monitored over 10 min using 
a V,„„jj microplate reader (Molecular Devices) to determine the rate of 
chromogenic substrate hydrolysis {CiA^^rJrmn). Inhibitory dose-response 
curves were generated by preincubating the enzyme with specific in- 
hibitors at different concentrations for 30 min at ambient temperature 
prior to the addition of the substrate. 

SemiquaniilaUve RT-PCR and Southern Blot Analysis — RT-PCR- 
linked Southern blot analysis to augment sensitivity of detection was 
utilized to investigate the temporal expression of hepsin in mouse 
preimplantation embryos. cDNAs from various stages of development 
were prepared from 40 to 50 embryos as described above. Oocytes were 
prepared from unmated females and treated with hyaluronidase (Sig- 
ma) to remove cumulus cells before proceeding to the total RNA isola- 
tion and cDNA synthesis as above. PGR was performed essentially as 
above with the mouse hepsin primers, 5'-ATCCAGCCAGTGTGTCTC- 
CCTG-3' and 5'-TCAGGGCTGAGTCACCATGCCAC-3', but with only 
15 cycles. Similar PGR reactions using, /3-actin primers (a gift; from Jeff 
Gimble, Department of Surgery, University of Oklahoma Health Sci- 
ences Center), were used as positive controls. Southern blot analysis of 
the PGR products was performed as described previously (30) using 
^'^P-labeled random-primed DNA probes generated from the same am- 
plified DNA regions as templates. 

Northern Blot Analysis — Total RNA was isolated from cells according 
to published methods (27). UNA was transferred to MSI-NT nylon 
membranes by capillary action, then cross-linked to membranes with 
UV light. Membranes were incubated for 1 h at 60 "C with prehybrid- 
ization buffer (500 him NaPO^, pH 7.4. 7% SDS, 1 mM EDTA). Mem- 
branes were then hybridized overnight in prchybridization buffer plus 
labeled cDNA probe at 60 'C. Probes were ^^P-labeled by random prim- 




FlG. 3. Temporal expression of hepsin in mouse preimplanta- 
tion embryos. Total RNA fi:om mouse embryos was isolated, then 
analyzed for hepsin mRNA expression by Southern blot-linked-RT-PCR 
analysis in = 3). p-Actin was used as a control. 

ing using a Prime-it II kit (Stratagene), then separated from unincor- 
porated label using ProbeQuant G-50 Micro columns (Pharmacia Bio- 
tech). Following three low stringency washes (15 min in 40 mM NaPO^, 
pH 7.2, 5% SDS, 1 mM EDTA, 0.5% bovine serum albumin at room 
temperature), and two high stringency washes (15 min with 40 mM 
NaP04, pH 7.2, 1% SDS, 1 mM EDTA at 60 **C), and one 30-min high- 
stringency wash, membranes were exposed to x-ray film adjacent to an 
enhancing screen. 

RESULTS 

Strategy for the Identification and Cloning of an Embryonic 
Serine Protease — A prior study using a radioiodinated active 
site chloromethyl ketone probe and SDS-PAGE detected a sin- 
gle serine protease of = 74,000 in mouse blastocyst lysates 
(26). Using RT-PCR and degenerate oligonucleotides based on 
conserved regions in the catalytic domain of serine proteases, 
we amplified and subcloned a 0.5-kb cDNA fragment encoding 
the putative mouse hatching enzyme from hatching blastocysts 
mRNAs. Ten separate clones were sequenced and found to be 
identical. Data base searches showed that the deduced amino 
acid sequence was simUar to that of human hepsin, a trypsin- 
like serine protease previously cloned from a Uver library (33). 
A full-length mouse hepsin cDNA (Fig. 1) was obtained after 
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screening a mouse liver library using the amplified DNA frag- 
ment as a probe. Hepsin is a type II transmembrane protein 
with an extracellular carboxyl- terminal catalytic domain (33, 
34). Based on the predicted amino acid sequence homology with 
other related serine proteases, hepsin is likely to be synthe- 
sized as a single chain zymogen that requires cleavage of the 
Arg'*^^-Ile^^^ bond to generate the mature, disulfide-linked two- 
chain form. In addition to the catalytic triad residues and 
Asp^"*^, which is important for trypsin-like specificity, the 
transmembrane and short cjrtoplasmic domains of hepsin are 
all conserved among mouse, rat, and human hepsin (Fig. 2). 
The significance of the transmembrane domain remains to be 
determined. 

Temporal Expression of Hepsin in Preimplantion Em- 
bryos — To determine if the temporal expression of hepsin was 
consistent with that of a hatching enzyme, we performed semi- 
quantitative RT-PCR-linked Southern blotting to indirectly de- 
termine the time and level of hepsin message in oocytes and in 
several stages of preimplantation development. Hepsin tran- 
scription was biphasic, beginning at the 2-cell stage, absent at 
the 8-cell stage, and peaking at the early blastocyst stage prior 
to hatching (Fig. 3). There was no detectable expression in 
oocytes, and, subsequent to embryo hatching, the level of ex- 
pression clearly diminished (Fig, 3). 

Tissue Expression and Multiple Hepsin mRNAs — Human 
hepsin was previously shown to be expressed primarily in liver 
and kidney, and mouse hepsin was similarly distributed (Fig. 
4). Unlike human hepsin, mouse hepsin had two alternative 
forms detected by Northern blotting, migrating at 1.8 and 1.9 
kb. To characterize the differences in the two hepsin mRNAs, 
we performed RT-PCR analysis using total RNA samples iso- 
lated from mouse liver and kidney. Several oligonucleotide 
primers spanning the hepsin cDNA sequence were utilized, as 
shown in Fig. 5. PGR analysis revealed that an insert in the 
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Fig. 4. Tissue distribution of mouse hepsin expression. Total 
RNA (20 ^g/lane) from several adult rat tissues was analyzed for hepsin 
expression by Northern blots hybridized with a cDNA consisting of the 
entire hepsin coding region. Two hybridizing species highly detected in 
liver and kidney correspond to mRNAs of approximately 1.8 and 1.9 kb 
in size. 



5 '-end of the coding sequence distinguished the 1.9-kb message 
from the 1.8-kb message. DNA sequencing revealed an addi- 
tional 60-bp sequence coding for 20 amino acids within the 
c3^oplasmic domain of 1.9-kb hepsin cDNA (Fig. 6). This se- 
quence has not been demonstrated in human hepsin. 

Expression and Autoactivation of Soluble Hepsin — Because 
hepsin is a type II transmembrane serine protease, we wanted 
to address the possibility that a soluble form of the enz3Tne 
could be expressed and used to elucidate hepsin's enzymatic 
properties. We developed an expression construct by site-di- 
rected mutagenesis that encoded for a zymogen form of hepsin 
lacking its transmembrane and cytoplasmic domains (soluble 
hepsin), and stably expressed it in human 293 epiihehal cells. 
Soluble hepsin was expected to be expressed as a single-chain 
zymogen which could be activated proteolytically to a disulfide- 
linked two-chain form, consisting of a 12-kDa Ught chain and 
3 1-kDa heavy chain. The intact precursor as well as proteol3i^i- 
caUy activated species would be expected to migrate with a 
- 43,000 on SDS-PAGE gels. Surprisingly, upon elution, solu- 
ble hepsin was spontaneously activated from a single-chain 
zymogen to the active disulfide-linked two-chain form (Fig. 7, 
WT lanes, and data not shown); this activation was not detected 
in the conditioned medium not subjected to purification (Fig. 7, 
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Fig. 5. Localization of the region of nucleotide insertion in the 
1.9-kb hepsin message. Total RNA from both mouse kidney and liver 
were subjected to RT-PCR analysis using different primers sets (each 
primer is denoted by a letter from A-E) to localize the region of nucle- 
otide differences between the 1.8- and 1.9-kb hepsin mRNAs. The po- 
sitions of the primers (arrows) are indicated along the 5'- to 3' -nucleo- 
tide sequence as represented by a horizontal bar above the gel image. 
The position of the nucleotide insertion is also marked. PGR products 
were separated by 1% agarose electrophoresis and stained with 
ethidium bromide. Pinmcr set A/B detected two dilTerent bands due to 
the 60-bp insertion in the coding region for the cytoplamic domain of 
hepsin. Primers were as follows: A, 5'-TGGGAATCATTAACAA- 
GAGTCCCTGAC-3'; B, 5'-AGTCAGGAATCGGCCTCTAGG-3'; C, 5'- 
AGGAAGCTGCCGGTGGACCGCATTGTG-3'; D: 5'-ATCCAGCCAGT- 
GTGTCTCCaTG-3'; E, 5'-TCAGGGCTGAGTCACCATGCCAC-3'. 



DESFGAHRaaSTCSRQPOROO 
GAT GAG GAA CCT GGO GOT CAC AGA GGA GGT TCC ACT TGT TCA AGA CCC CAA CCT AAG GOT GGC 



MAKE 
ATG GCG AAG GAG 



RTAACCSRPK 
CGO ACT OCA OCA TGC TOC TCC AOA CCC AAG 



Fig. 6. Alternative cj^oplasmic domains in the two hepsin mRNAs. Amino acid and cDNA sequence of the hepsin cytoplasmic domain, 
with the inserted sequence within the 1.9-kb form shown above the 1.8-kb form of hepsin. 
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CM lanes). Additionally, it ftirther processed itself from a 43- to 
29-kDa form (Fig. 7, non-reduced WT lane). Upon reduction, 
only a 31-kDa band, which represented the heavy or catalytic 
chain, was seen, suggesting that only the light chain was pro- 
teolytically modified to generate the 29-kDa form seen under 
nonreducing conditions. The autoactivation of soluble hepsin 
upon elution was not seen with a catalytically inactive S352A 
soluble hepsin mutant, in which the active site serine was 
replaced by alanine (Fig. 7, S352A lanes). Of note, the initial 
eluate, when immediately prepared and separated by reducing 
SDS-PAGE, showed only a small amount of conversion to the 
two-chain form (data not shown). Similarly, the presence of the 
inhibitor benzamidine in the eluate prevented the conversion 
and only a small converted fraction was seen on reducing 
SDS-PAGE (data not shown), 

DISCUSSION 

We have identified hepsin, a membrane-bound serine prote- 
ase previously shown to activate fVII (35), in preimplantation 
mouse embryos as early as the two-cell stage. Based on evi- 
dence that a single serine protease is present in preimplanta- 
tion embryos (26), it is possible that hepsin represents the first 
such protease expressed during development. Prior in vitro 
experimentation implicated hepsin in the maintenance of cel- 
lular morphology and hepatoma cell growth (36), and in blood 
coagulation by human factor VII activation (35). Increased 
hepsin expression has also been associated with ovarian cancer 
(37). No developmental functions of hepsin have been de- 
scribed. Whether hepsin plays a critical role in early develop- 
ment is not clear, but it is possible that it plays a role in 
blastocyst hatching. 
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Fig. 7. Soluble hepsin is capable of autoactivation. Wild-type 
and S352A soluble hepsin was isolated from medium conditioned by 
transfected 293 epithelial cells, and proteins were separated by both 
nonreducing and reducing SDS-PAGE and blotted to nitrocellulose 
membrane. The primary HFP-2 and anti-goat alkaline phosphatase- 
conjugated antibodies were used to visualize hepsin in conditioned 
medium (CM), as well as purified soluble hepsin (WT) and its inactive 
mutant {S352A). Molecular mass markers are shown in kDa. 
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The hepsin amino acid sequence suggests it is a type II 
transmembrane serine protease zymogen with an extracellular 
carboxyl-terminal catalytic domain. The internal signal se- 
quence, serving as a transmembrane domain, is surprisingly 
conserved. The presence of this transmembrane domain is con- 
sistent with Perona and Wassarman*s (26) data suggesting that 
the putative mouse hatching enzyme, which would be ex- 
pressed in early preimplantation embryos, is membrane- 
bound. The trj^jsin-specificity conferring Asp^^® that hnes the 
SI subsite and composes part of the specificity pocket is present 
and conserved, indicating that hepsin is likely to have trypsin- 
like specificity. Indeed, our activity assay of the recombinant 
soluble hepsin using a number of chromogenic substrates have 
confirmed this observation. The reason for the presence of two 
forms of hepsin, differing in the cytoplasmic domain, is not 
clear. The inserted sequence in the 1.9-kb form of hepsin has no 
homology to any domains found in signal transducing proteins. 
It is unlikely that changes to the cytoplasmic domain alter 
hepsin*s proteolytic properties, particularly since the soluble 
form of the enzyme is apparently fully functional. Whether the 
1.8- and 1.9-kb hepsin mRNAs are the result of two different 
genes or, more likely, the result of alternative splicing of a 
single gene transcript remains to be defined. 

Since hepsin is likely to be expressed as a zymogen based on 
the predicted amino acid sequence, and appears to be the only 
serine protease present during blastocyst hatching, the ques- 
tion arises, what is the mechanism of its activation? Our hy- 
pothesis is that density-dependent autoactivation occurs, as 
suggested by data from our soluble hepsin expression study. 
We noted that during purification, upon elution with EDTA, 
soluble hepsin was spontaneously converted to the active, 
disulfide-linked two-chain form probably via cleavage of the 
Arg^^^-Ile^^'^ bond. The conversion was clearly concentration 
dependent (activation was only seen in the eluate and not in 
the diluted conditioned medium) and required hepsin's inher- 
ent enzymatic activity since it was not observed with a cata- 
lytically inactive S352A mutant soluble hepsin. These data 
indicate that hepsin was capable of concentration-dependent 
autoactivation. Since hepsin is membrane-bound via a trans- 
membrane domain, its density and lateral diffusion on the 
trophoblast surface may play an important role in achieving 
the concentration needed for autoactivation (Fig, 8). This mode 
of autoactivation resembles fVII cell surface autoactivation, 
which utilizes distinct tissue factor molecules to localize both 
the fVII and fVIIa to the cell surface, forming two separate 
membrane-bound binary complexes. The complex with the ac- 
tive fVIla then activates the adjacent tissue factor-anchoring 
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Fio. 8. Model of hepsin activation. Based on structural similarities to other serine proteases, hepsin is expressed as a single-chain zymogen 
and can be activated proteolytically by a single cleavage at the Arg^'*^-lle^''~ bond to generate the two-chain, membrane-bound form. Its deduced 
primary amino acid sequence suggests that hepsin is expressed as a type II transmembrane zymogen with an extracellular carboxyl catalytic 
domain. The heavy or catalytic chain is linked to the light chain via a disulfide bond (C-C). The light chain is anchored to the cell membrane by 
a hydrophobic, internal signal sequence. Based on the soluble hepsin expression studies, the mode of activation on the cell surface is likely to be 
autoactivation. Our evidence further suggests that a soluble form, resulting from additional cleavages of the membrane-bound light chain, is 
possible. 
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fVII, obeying obligatory two-dimensional enzyme kinetics (38). 
Hepsin autoactivation is likely to follow similar kinetics, but 
further studies are necessary to elucidate its mechanism of cell 
surface autoactivation. Interestingly the recent purification of 
intact hepsin from rat Uver microsomes also resulted in its 
activation (39), but it was not clear if this was the result of 
autoactivation or of the action of another protease. Our data 
with the inactive hepsin mutant suggest that membrane-bound 
hepsin is capable of autoactivation. 

The autoactivation of soluble hepsin additionally generated a 
second form of the enzyme. A band of 29 kDa, which was absent 
in the S352A mutant, along with the intact 43 kDa, were both 
present when the eluate was analyzed on nonreducing SDS- 
PAGE and Western blot experiments. This 29-kDa form was 
likely to be the result of proteolytic modification of the light 
chain of the active two-chain form since only the intact cata- 
lytic heavy chain was seen under reducing conditions. The 
presence of this 29-kDa form suggests that membrane-bound 
hepsin can be cleaved off the trophoblast surfaces of embryos 
(Fig. 8). Interestingly, Sawada et al. (40) have demonstrated 
the presence of a soluble trypsin-like activity in blastocyst 
culture medium and that this activity represented that of a 
hatching enzyme. Whether this secreted trypsin-like activity 
and the 29-kDa form of hepsin are one and the same, and what 
roles it may play during embryogenesis, remain to be 
determined. 
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Abstract 

We report the characterization of a novel serine protease of the chy- 
motrypsin family, recently isolated by cDNA-representational difference 
analysis, as a gene overexpressed in pancreatic cancer. The 2.3-kb mRNA 
of the gene, named TMPRSS3^ is strongly expressed in a subset of pan- 
creatic cancer and various other cancer tissues, and its expression corre- 
lates with the metastatic potential of the clonal SUIT*2 pancreatic cancer 
cell lines. The deduced polypeptide sequence consists of 437 amino acids 
and exhibits all of the structural features characteristic of serine proteases 
with trypsin-Iike activity. TMPRSS3 is membrane bound with a NH^- 
terminal signal-anchor sequence and a glycosylated extracellular region 
containing the serine protease domain. Thus, TMPRSS3 is a novel mem- 
brane-bound serine protease overexpressed in cancer, which may be of 
importance for processes involved in metastasis formation and tumor 
invasion. 

Introduction 

Proteases have been increasingly recognized as important factors in 
the pathophysiology of tumorous diseases. The proteolytic degrada- 
tion of the extracellular matrix, which is an indispensable step in 
tumor invasion and metastasis, is mediated by members of the four 
major classes of endopeptidases, including serine, cysteine, aspartyl, 
and metalloproteases (1). In this highly complicated process, a cas- 
cade of events requiring a variety of proteases seems to be involved. 
Numerous reports have demonstrated an increased production of 
extracellular matrix degrading enzymes, including type IV collagen- 
ase (MMP-2), cathepsin B, cathepsin D, and serine proteases such as 
plasminogen activator in tumor cells (1). The proteolytic enzymes of 
the serine protease family exist as single-chain or double-chain zy- 
mogens activated by specific and limited proteolytic cleavage. They 
contain the three active-site amino acids histidine, aspartate, and 
serine, which participate in peptide bond hydrolysis. The geometric 
orientation of this catalytic triad is similar in different serine 
proteases, despite the fact that folding of the proteases may be 
different (2). 

In the present study, we report the cloning and characterization of 
a novel serine protease identified in a recent cDNA-RDA** approach 
(3), This study was designed to isolate gene fragments highly over- 
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expressed in pancreatic cancer compared with normal pancreas and 
chronic pancreatitis tissue. From the 16 gene fragments isolated in this 
study, we selected the 313-bp gene fragment RDA 12 (GenBank 
accession no. U54603) for further characterization. Database compar- 
ison revealed a moderate homology to a number of serine proteases, 
indicating that RDA 1 2 may be a fragment of a novel protease with 
cancer-specific expression. 

IMaterials and Methods 

Materials. Human tissue from patients with ductal adenocarcinoma of the 
pancreas (/? = 13), carcinoma tissues of different origin, human pancreatic 
tissue from organ donors {n = 6)» and chronic pancreatitis tissue {n = 6) was 
provided by the Hungarian Academy of Sciences (Budapest, Hungary) and the 
Department of Surgery of the University of Ulm. All tissue samples were 
obtained after approval by the local Ethics Committee. 

The human pancreatic cancer cell lines were obtained from the following 
suppliers: PATU-8988S and PATU-8988T (German Collection of Microor- 
ganisms and Cell Cultures, Braunschweig, Germany); PANC-1 and MIA- 
PaCa-2 (European Collection of Animal Cell Cultures, Salisbury, United 
Kingdom); HPAF (Melzgar, Durham, NC); Capan-1, Capan-2, and AsPC-1 
(Cell Lines Ser\'ice, Heidelberg. Germany); Patu II (Elsasser, Marburg, Ger- 
many); PC2 (Bulow, Mainz, Gennany); SUIT-2 (S2-007, S2-013, S2-020, and 
S2-028; Iwamura, Miyazaki. Japan; Ref 4); and SKPC2 and IM1M-PC2 
(P. Real, IMIM, Barcelona, Spain). 

Cloning of a New Serine Protease cDNA. In a recent screen for differ- 
entially expressed genes in pancreatic carcinoma, the 313-bp gene fragment 
RDA 12 (accession no. U54603) was isolated by cDNA-RDA (3); this fragment 
encodes the putative motif of a new serine protease. The RDA 12 fragment was 
used to screen —20,000 clones of an oligo(dT)-primed cDNA library from a 
pancreatic cancer cell line by hybridization. Both strands of the longest cDNA 
clone, RDA 12/2, were sequenced by primer walking. For stable transfection in 
mammalian cells, the cDNA clone RDA 12/2 was cloned in sense and antiscnse 
orientation into the BamHX site of the mammalian expression vector pH^- 
Aprl-neo (5). A COOH-terminal-tagged TMPRSS3 expression vector was 
constructed by insertion of a 1427-bp fragment (nucleotides 96-1522) con- 
taining the open reading frame of TMPRSS3 into the BstXX site of the mammalian 
expression vector pcDNA6/V5/His B (Invitrogen, San Diego, CA). 

Northern Blot Analyses. The expression of TMPRSS3 was studied by 
hybridizations using Northern blots containing 30 ^ig each of total RNA from 
normal pancreas tissue, chronic pancreatitis tissue, different carcinoma tissues, 
and cell lines. The Northern blols containing RNA of different human tissues 
were purchased from Clontech (Heidelberg, Germany). 

Cell Culture and Transfection. For functional analysis of TMPRSS3, the 
S2-020 pancreatic cancer cell line, which expresses no endogenous TMPRSS3 
mRNA, was transfected with the 7'A//'/?55i-pH)3-Aprl-neo construct in sense 
and antiscnse orientation using DMRIE-C (Life Technologies. Inc., Eggen- 
stein, Germany). Several clones were picked that showed various degrees of 
stable TMPRSS3 sense/antisense mRNA expression. Two of each sense and 
antisense clones were used for functional assays. 

HEK-293 cells were plated at 1.5 x 10** cells/1 0-cm dish and grown 
overnight in DMEM supplemented with 10% PCS. Cells were transiently 
transfected with the rA/P/?55i-pcDNA6/V5/His plasmid DNA by use of the 
calcium phosphate protocol. 
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^ ACACAGAGAGAGGCAGCAGCTTGCTCAGCGGACA 
35 AGGATGCTGGGCGTGAGGGACCAAGGCCTGCCCTGCACTCGGGCCTCCTCCAGCCAGTGCTGACCAGGGACTTCTGACC^ 
125 CAGGACCTGTGTGGGGAGGCCCTCCTGCTGCCTTGGGGTGACAATCTCAGCTCCAGGCTACAGGGAGACCGGGAGGATCACAGAGCCAGC 
2X5 ATGTTACAGGATCCTG ACAGTGATCAACCTCTG AACAGCCTCGATGTCAAACCCCTGCGCAAACCCCGTATCCCCATGG AGACCTTCAG A 
IMLQDPDSDQPLNSLDVKPLRKPRI PMETPR 

3 05 AAGGT GGGGATCCCCATCATCATAGCACTACTGAGCCTGGCGAGTATCATCATTGTGGTTGTCCTCATC AAGGTGATTCTGGATAAATAC 

31 K |y^;-G V-i vfi :i A^.^i^ ::s-. L ;A v a> i ; i;; i - ?v .v.. ;y -i^-^"^^ v i l d k y 
39 5 tacttcctctgcgggcagcctctccacttcatcccgaggaagcagctgtgtgacggagagctggactgtcccttgggggaggacgagga^ 

eiYFLCGQPLHFlPRKQLCDGELDCPLGEDEE 

485 cactgtgtcaagagcttccccgaagggcctgcagtggcagtccgcctctccaaggaccgatccacactgcaggtgctggactcggcc^^ 
91hcvksfpegpavavrlskdrstlqvldsat 

575 gggaactggttctctgcctgtttcgacaacttcacagaagctctcgctgagacagcctgtaggcagatgggctacagcagc;^ 

121 GNWFSACFDNFTEALAETACRQMGYSSKPT 



665 TTCAGAGCTGTGGAGArTGGCCCAGACCAGGATCTGGATGTTGTTGAAATCACAGAAAACAGCCAGGAGCTTCGCATGCGGAACTCAAGT 
151 FRAVEIGPDQDLDVVEITENSQELRMR N S S 



755 GGGCCCTGTCTCTCAGGCTCCCTGGTCTCCCTGCACTGTCTTGCCTGTGGGAAGAGCCTGAAGACCCCCCGTGTGGTGGGTGGGGAGGAG 
ISlGPCLSGSLVSIiHCLACGKSLKTPRVVGGEE 

84 5 GCCTCTGTGGATTCTTGGCCTTGGCAGGTCAGCATCCAGTACGACAAACAGCACGTCTGTGGAGGGAGCATCCTGGACCCCCACTGGGTC 
211 ASVDSWPWQVSIQYDKQHVCGGSILDPHWV 

935 CTCACGGCAGCCCACTGCTTCAGGAAACATACCGATGTGTTCAACTGG AAGGTGCGGGCAGGCTCAGACAAACTGGGCAGCTTCCCATCC 
241 LTAAHCFRKHTDVFNWKVRAGSDKLGSFPS 

▲ 

1025 CTGGCTGTGGCCAAGATCATCATC ATTGAATTC AACCCCATGT ACCCCAAAG ACAATGACATCGCCCTCATGAAGCTGCAGTTCCCACTC 
271 LAVAKI I I lEFNPMYPKDN^lALMKLQFPL 

1115 ACT^CTCAGGCACAGTCAGGCCCATCTGTCTGCCCTTCTTTGATGAGGAGCTCACTCCAGCCACCCCACTCTGGATCATTGGATGGGGC 
301 TPSGTVRPICLPFFDEELTPATPLWIIGWG 

12 05 TTTACGAAGCAGAATGGAGGGAAGATGTCTGACATACTGCTGCAGGCGTCIAGTCCAGGTCATTGACAGCACACGGTGCAATGCAGACG 
331FTKQNGGKMSDILLQASVQVIDSTRCNADD 

12 95 GCGTACCAGGGGGAAGTCACCGAGAAGATGATGTGTGCAGGCATCCCGGAAGGGGGTGTGGACACCTGCCAGGGTGACAGTGGTGGGCCC 
361 AYQGEVTEKMMCAGI P-EGGVDTCQGD^GGP 

1385 CTGATGTACCAATCTGACCAGTGGCATGTGGTGGGCATCGTTAGCTGGGGCTATGGCTGCQQGGGCCCGAGCACCCCAGGAGTATACACC 
391 LMYQSDQWHVVGIVSWGYGCGGPSTPGVYT 

14 75 AAGGTCTCAGCCTATCTCAACTGGATCTACAATGTCTGGAAGGCTGAGCTGTAATGCTGCTGCCCCTTTGCAGTGCTGGGAGCCGCTTCC 
421 KVSAYLNWIYNVWKAEL* 

1565 TTCCTGCCCTGCCCACCTGGGGATCCCCCAAAGTCAGACACAGAGCAAGAGTCCCCTTGGGTACACCCCTCTGCCCACAGCCTCAGCATT 

1655 TCTTGGAGCAGCAAAGGGCCTCAATTCCTATAAGGAACCCTCGCAGCCCAGAGGCGCCCAGAGGAAGTCAGCAGCCCTAGCTCGGCCACA 

174 5 CTTGGTGCTCCCAGCATCCCAGGGAGAGACACAGCCCACTGAACAAGGTCTCAGGGGTATTGCTAAGCCAAGAAGGAACTTTCCCACACT 

183 5 ACTGAATGGAAGCAGGCTGTCTTGTAAAAGCCCAGATCACTGTGGGCTGGAGAGGAGAAGGAAAGGGTCTGCGCCAGCCCTGTCCGTTTT 

1925 CACCCATCCCCAAGCCTACTAGAGCAAGAAACCAGTTGTAATATAAAATGCACTGCCCTACTGTTGGTATGACTACCGTTACCTACTGTT 

2015 GT CATTGTTATTAC AG CT ATGG CO ACT ATT ATT AAA G AGCTGTGT AACATTTCTGGCAAAAAAAAAA 

Fig. 1. Nucleotide sequence of the cDNA coding for human TMPRSS3 and its predicted amino acid sequence. The bold nucleotide sequence 1189-1501 represents the initially 
isolated RDA12 gene fragment, the underlined nuc]eotides 2045-2050 mark the potential polyadcnylation signal. The amino acid sequence highlighted by a gray box represents the 
potential transmembrane domain. A indicates the active-site residues histidinc (//), aspartate (£)), and serine (S). Double underlines indicate potential A^-linked glycosylation sites. 




Preparation of Cell Extracts and Subcellular Fractionation. Forty-eight 
h after transient transfection with V5-tagged TMPRSS3 into HEK-293 cells, 
protein extracts were prepared by resuspending pelleted cells in 1% Triton 
X-100, 1% sodium deoxycholaie, 0.1% SDS, 150 niM NaCl, 50 mM Tris-HCl 
(pH 7.2) supplemented with 5 /u.g/ml Aprotinin. 5 iriM Pefabloc, and 10 jLjtg/ml 
Pepstatin. For immunopurification of the epitope-tagged protein, cell lysates 
were incubated with V5 antibody conjugated to protein G-agarose beads at 4°C 
for 4 h on a shaker. The agarose beads were pelleted by centrifugation and 
washed twice with 1 50 mM NaCl, 5 mM EDTA, 50 mM Tris + 0. 1% NP40. The 
washed pellets were resuspended in 150 mM NaCl, 5 mM EDTA, 50 mM 
Tris + 0.1% NP40 for PNGase F treatment. 

Subcellular fractions were prepared from transiently transfccted HEK-293 
cells as reported previously (6). TTie plasma membrane- enriched fraction, 
which was prepared using sucrose density gradient centrifugation, the cytosolic 
fraction, and concentrated culture medium were studied by Western blot 
analysis. 



Glycosylation. For PNGase F treatment, immunopurified protein was in- 
cubated overnight with 2 units of PNGase F supplemented with 10 mM EDTA 
at 37°C. Inhibition of A^- and mucin-like O-glycosylation was perfonned by 
cultivating TMPRSS3-expressing HEK-293 cells for 24 h in DMEM, 10% 
FCS containing either 2.5 /Ltg/ml tunicamycin (7) or 2 mM phenyl-N-Acetyl- 
a-D-galactosaminide (8). Thereafter, cells were harvested for protein extrac- 
tion. 

Functional Assays. Nude mouse experiments were done by injecting 
2 X 10* S2-020 cells stably transfccted with TMPRSS3 sense/anti sense con- 
structs, both s.c. and in the tail vein of female nu/tw mice. Five weeks after the 
tail vein injections, the lung, spleen, and liver were used for standard histo- 
logical analysis to identify the presence or absence of metastatic lesions. 
Subcutaneous tumors were measured and used for histological analysis. 

//I vifro mairigel invasion assays were done by seeding 10* transfccted cells 
in medium + 1% FCS in the upper chamber of Matri gel -coated 8-fxm tran- 
swell plates. The lower chamber was filled with medium + 10% FCS. The 
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normal 
pancreas 



chronic 
pancreatitis 



Fig. 2. Northern bloi analyses of ihc TMPHSS3 transcript in 
difTcrcnt tissues and cell lines. Tlic Northern blots contain 30 p.g 
of total RNA per lane from normal human pancreas {n = 6), 
chronic pancreatitis tissue (n = 6), pancreatic carcinoma tissue 
(/I = 13; Lanes I-I3), and cancer tissues of different origin 
{Lanes 14-16, 19-21, and 23, colorectal carcinoma; Lanes 17 
and 2S-27^ gastric cancer. Lane 22, soft tissue sarcoma; Lane 18, 
breast cancer; Lane 24, carcinoma of the papilla vateri) and the 
SUIT-2 subclones S2-028, S2-0U, and S2-007. RNAs from 
normal pancreas, chronic pancreatitis, and pancreatic cancer 
tissue samples were run on the same Northern blot gels. The 
auto radiographs for cancer and control tissues arc shown sepa- 
rately for improved prescniaiion of the data. 
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number of invading cells adhering to the lower side of the porous membrane 
was counted after fixation with 4% paraformaldehyde and staining with 
methylene blue. 

The proteolytic activity in TMPRSS3 sense/antisense-transfected S2-020 
cells and transiently transfected HEK-293 cells was determined fluorometri- 
cally in native lysates and lysates treated with enterokinase for activation, 
using oligopeptide substrates for clastasc-like (Ala-Ala-Ala-Ala) and trypsin- 
like (Ile-Pro-Arg) serine proteases as described previously (9). 

Chromosomal Mapping of the TMPRSS3 Gene Locus. The chromo- 
somal localization of TMPRSS3 was determined by screening the GeneBridge4 
radiation hybrid panel (Research Genetics, Hunisville, AL), using the 
TMPRSS3'S^tc\V\(: primers 5'-CATGTGGTGGGCATCGTTA-3' and 5'- 
CCAGTTGAGATAGGCTGAG-3 ' . 

Results and Discussion 

The 3I3-bp fragment encoding the putative motif of a new serine 
protease isolated in a recent cDNA-RDA screen for genes differen- 
tially expressed in pancreatic cancer (3) was used to screen a pancre- 
atic cancer cDNA Hbrary. Among 16 isolated homologous clones, a 
clone designated RDA12/2 contained the full-length sequence. The 
sequence of clone RDA12/2 comprised 2071 bp, including a 214-bp 
5' untranslated region, an open reading frame of 1311 nucleotides, 
and a 546-bp 3' untranslated region (Fig. I). Translation of the open 
reading frame suggests that the cDNA codes for a putative polypep- 
tide of 437 amino acids with an estimated molecular mass of 48.202 
kDa. The NHj-temiinal region of the hypothetical protein contains a 
putative signal-anchor sequence characteristic for group 11 integral 
membrane proteins. The highly hydrophobic region of 22 amino acids 
may serve as a transmembrane domain that is involved in anchoring 
the protease to the cell membrane. According to the charge difference 
rule (10), it can be assumed that the COOH terminus of the protein 
with its protease module is located on the extracellular surface. 

Although the nucleotide sequence is unique, database comparisons 
of the amino acid sequence revealed a homology to a number of serine 
proteases. Thirty- five percent identity and -^50% similarity was found 
to members of the serine protease family known as the human trans- 
membrane proteases, TMPRSSI/hepsin (1 1) or TMPRSS2 (12). Thus, 
our new protease is the third member of a family of transmembrane- 
bound serine proteases. Consequently, this new gene was named 
TMPRSS3 for transmembrane protease, serine 3. Sequence homology 
was high in the domains containing the three principal active-site 
amino acids H'"*', D"*", and S"**^, required for peptide bond hydrol- 
ysis. The arrangement of the catalytic residues in the linear sequence 
defines the membership of TMPRSS3 to the SI family of the chy- 
motrypsin clan SA of serine-type peptidases (2). The prototype of this 
family is chymotrypsin, and the three-dimensional structures of some 
of its members have already been resolved (12). 



TMPRSS3 is predicted to cleave in a trypsin-like manner after 
lysine or arginine residues because it contains D"'^' at the base of the 
specificity pocket that binds the substrate (13). In addition, the novel 
protein shares considerable structural similarities of the TMPRSS 
family, including the putative NH2-terminal membrane anchor and the 
conserved cysteine residues, which by homology most likely form the 
disulfide bonds C'^^'^-C^^", C^-•»*^-C^^^ C^^^-C-''^ and C^«^-C*'**. 
Serine proteases are most commonly synthesized as inactive proen- 
zymes, which are activated by extracellular, proteolytic removal 
of a propeptide. At the NH^-terminal part of the protease domain, 
TMPRSS3 contains the peptide sequence RVVGG, which is typical 
for the proteolytic activator site of many protease zymogens. The 
potential cleavage between R"""* and V*^"'^ would result in a new 
terminal a-amino group, which forms a salt bridge with D'^**^ and 
thereby leads to the assembly of the functional catalytic sites. There- 
fore, the activated form would consist of a non-protease and a protease 
subunit linked by a disulfide bond that most likely involves C**^- 
C^'**. Whether this activation is mediated under physiological condi- 
tions by autocatalytic cleavage or other proteases is not known. The 
TMPRSS3 gene locus was localized to chromosome 1 1 at q23.3 
between the markers Dl 1S4362 and Dl 1S4387 by use of a radiation 
hybrid panel. 

As anticipated, an overexpression of the 2.3-kb transcript was 
found in 9 of 13 primary pancreatic carcinoma tissues (Fig. 2) and in 
10 of 16 pancreatic carcinoma cell lines (not shown) by Northern blot 
analysis. Because TMPRSS3 was not expressed in normal pancreas 
{n = 6) and in chronic pancreatitis {n ~ 6) tissue samples, overex- 
pression appears to be cancer-specific and not due to inflammatory 
alterations in the stroma. No clear correlation was found between the 
stage of pancreatic tumors and the expression of the protease (Table 
1). Northern blot analyses with RNA from a small number of other 
tumor tissues revealed that TMPRSS3 overexpression is not restricted 



Tabic 1 TNM classification of pancreatic cancer patients 
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Fig. 3. a, hydropaihicity ploi of the predtcied TM- 
PRSS3 protein. The method of Kytc and Doolitilc 
(20) was used, using a window of 1 7 residues (http:// 
bioinfonnaiics.weizmann.ac.iI/hydroph/). The peak 
spanning amino acids 32-53 represents the putative 
transmembrane domain. 6, schematic representation 
of the different domains of TMPRSS3, a type II 
membTane-associalcd serine protease. Numbers cor- 
respond to the amino acids, deduced from the cDNA 
sequence shown in Fig. 1. The disulfide bonds were 
deduced based on the structure of TMPRSSI and 
TMPRSS2, the most homologous proteins. poL^ 
potential. 
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to pancreatic cancer, but can also be found in gastric (h = 4), expressed in HEK.-293 cells, immunoprecipitated, and treated with 



colorectal (w = 7), and ampullary (« = 1) cancer No expression was 
found in one tissue sample each of soft tissue sarcoma and breast 
cancer (Fig. 2). TMPRSS3 transcripts were not detectable in normal 
heart, brain, placenta, lung, liver, skeletal muscle, uterus, and adipose 
tissue. A weak signal was found in tissues of the normal gastrointes- 
tinal tract (esophagus, stomach, small intestine, colon) and in some 
tissues of the urogenital tract (kidney and bladder). Nevertheless, 
expression was much weaker than in the corresponding tumors (data 
not shown). Furthermore, we analyzed the expression of TAiPRSS3 in 
the SUIT-2 clonal cell lines S2-007, S2-013, and S2-028 (4). These 
subclones of the human pancreatic cancer cell line SUIT-2 differ in 
their spontaneous metastatic potential after s.c. injection in nude mice. 
In this setting S2-007 regularly shows a high rale of metastases, 
whereas the other two cell lines show a lower rate (S2-013) or no 
metastases at all (S2-028). As shown in Fig. 2, the strength of 
TMPRSS 3 Gxprtssion correlated well to the metastatic potential of the 
SUIT-2 subclones, which may serve as an indication that this serine 
protease is associated with the promotion of metastasis. 

The sequence of TMPRSS3 suggests that this novel serine protease 
contains a signal anchor characteristic for group II integral membrane 
proteins with a hydrophobic transmembrane domain (Fig. Za). Ac- 
cording to the charge difference rule (10), the transmembrane domain 
(amino acids 32—53) anchors the protease to the cell membrane. 
Because of this anchorage, the NH2-terminal domain (amino acids 
1-31) would appear to be located intracellularly, and the COOH- 
terminal region (amino acids 54-437), which contains the catalytic 
domain, would be located extracellularly (Fig. 36). The alleged sub- 
cellular localization of the protease was confimied using a V5-tagged 
TMPRSS3 construct, which was transiently transfccted into HEK-293 
cells. Membrane fractionation and Western blotting with the corre- 
sponding anti-V5 antibody revealed a signal only in the plasma 
membrane-enriched fraction, whereas no tagged TMPRSS3 protein 
was detectable in the cytosol and in the culture medium (Fig. 4), 

This experiment also uncovered post-translational modifications of 
TMPRSS3. Although the calculated theoretical molecular mass of the 
epitope-tagged fusion protein is 52 kDa, its size in a SDS-polyacryl- 
amide gel is -^68 kDa, suggesting the presence of potential carbohy- 
drate moieties. The primary sequence of TMPRSS3 displays two 
consensus motifs for A^-linked glycosylation (N-X-T/S) at N'**** and 
N'^**. To confirm this A^-glycosylation, epitope-tagged TMPRSS3 was 



PNGase F. This resulted in an increase in mobility on denaturing 
SDS-PAGE, demonstrating A^-glycosylation of TMPRSS3 (Fig. 4). 
Cultivation of transfected HEiC-293 cells in the presence of tunica- 
mycin, an inhibitor of yV-glycosylation, revealed the same mobility 
shift of TMPRSS3 to a molecular mass of 60 kDa. Phenyl-A^-acetyl- 
a-D-galactosaminide, which inhibits mucin-like O-glycosylation, had 
no effect on the molecular mass (data not shown). The generation of 
recombinant proteases frequently has been shown to be difficult or 
impossible (14). Despite extensive and repeated efforts, we were 
unable to successfully generate recombinant protein in Escherichia 
coli and insect cells, possibly because TMPRSS3, as many other 
proteases, had a cytotoxic effect on transfected cells. Repeated efforts 
to generate peptide antisera failed as well (data not shown), and a 
TMPRSS3 antibody was therefore not available for further studies. 

Whereas the established physiological role of the chymotrypsin 
family of secreted serine proteases is primarily in protein catabolism, 
the ftinction of serine proteases of the TMPRSS family is of special 
interest. Although the function of TMPRSS2 remains unknown (12, 
15), TMPRSSI, also known as hepsin, frequently is overexpressed in 
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Fig. 4. Western blot analysis of V5-taggcd TMPRSS3 protein. Protein extracts from 
rA//»/f55i-pcDNA6/V5/His-lransfcctcd HEK-293 cells were resolved in 9% SDS-PAGE 
and transferred to nitrocellulose membranes. Membranes were immunoblotted with an 
anii-V5-horseradish peroxidase antibody followed by chcmiluminescencc detection, a, 20 
\L% of total protein extract, b, subcellular localization; C, cytosolic fraction; A/, plasma 
membrane-enriched fraction; 5, concentrated culture medium, c, analysis of A^-linkcd 
glycosylation of the TMPRSS3 protein. A shift in molecular mass was delected both after 
PNGase F treatment of the immunoprecipitatcd protein and after exposure of the trans- 
fected cells to tunicamycin. indicating A'- glycosylation of the protein. 
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ovarian tumors and may therefore contribute to the invasive nature or 
growth capacity of ovarian tumor cells (16). Treatment of hepatoma 
cells with antihepsin antibodies or specific antisense oligonucleotides 
confirmed that hepsin plays an essential role in cell growth and 
maintenance of cell morphology (17). It has also been shown that 
hepsin can proteolytically activate human coagulation factor VII and 
thereby contribute to the activation of the coagulation cascade (18). 

The correlation of TMPRSS3 expression with the metastatic poten- 
tial of the SUIT-2 cell lines is a first indication that this new protease, 
in the same way as hepsin, may be involved in promoting metastasis 
formation and tumor invasion. To confirm this hypothesis in func- 
tional assays, stably transfectcd S2-020 cell lines were generated 
using the TMPRSS3 cDNA cloned in sense and antisense orientation 
into the pH)3-Aprl-neo vector. Several clones were generated show- 
ing variable degrees of TMPRSS3 sense/antisense mRNA transcrip- 
tion. Two sense and two antisense clones were further characterized 
by s.c. injections in nude mice, in vitro Matrigel invasion assays, and 
biochemically for their capacity to hydrolyze substrates for trypsin 
and elastase. No significant differences could be observed between 
sense and antisense clones in any of the functional assays. There was 
no difference in tumor size and local invasiveness after s.c. injections, 
and there was no evidence of metastasis formation after tail vein 
injection with both sense and antisense cells. Similarly, we failed to 
show an effect on in vitro invasiveness and on proteolytic activity of 
native and enterokinase-treated lysates for a selection of serine pro- 
tease substrates. Many factors may be responsible for the failure of 
rA'/P/?4S'5'i-transfected tumor cells to behave differently in these assay, 
including the necessity for a complex activation mechanism, pro- 
cesses that affect protein folding, or the absence of essential cofactors. 
Furthermore, although transiently transfected HEK-293 cells showed 
expression of the V5-tagged recombinant TMPRSS3 protein, we 
could not directly demonstrate expression of the protein in the trans- 
fected cells because v^e lacked a specific antibody. In the absence of 
final experimental proof, we can therefore only hypothesize, based on 
the structural characteristics and the expression pattern in cancer 
tissues and in the SUIT-2 subclones, that this new protease has a 
potential role for tumor progression, metastasis formation, and tumor 
invasion. 

Proteases have an important function in the context of tumor 
growth, because they can break down the surrounding extracellular 
matrix components, they can pave the way for spreading tumor cells, 
and they can release and activate growth and angiogenic factors. 
Protease activity on the surface of tumor cells is required to allow 
malignant invasion through surrounding connective tissue, which is an 
important event in the multistep process of metastasis formation (19). 
Thus, it is conceivable that TMPRSS3 may contribute to the invasive 
and metastatic potential of tumor cells. In this context, cell surface 
proteases such as TMPRSS3 may fiinction as an activator of other 
extracellular proteases or act directly by degrading the extracellular 
matrix surrounding the tumor cells. Furthermore, TMPRSS3, as 
shown for many other proteases, may participate in the activation of 
hormones or growth factors by proteolytic cleavage of inactive pro- 
forms. Because the biochemical events required for the activation of 



this novel serine protease are unknown and the specific substrates 
have not yet been identified, the precise role of TMPRSS3 in carci- 
nogenesis remains to be elucidated. 
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INTRODUCTION 

In this review we attempt a timely survey of issues concerning protein 
translocation across the membrane of the endoplasmic reticulum of eu- 
karyotic cells. We focus on recent developments, open questions and current 
controversies. Due to limited space, this review cannot be and is not 
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intended to be comprehensive. Where appropriate, reference to more 
detailed reviews is given in the text. 

Eukaryotic cells contain a multiplicity of membrane-delimited com- 
partments. The selective localization of particular proteins provides the 
basis for each of these compartments to serve various specialized functions. 
Thus, for example, the mitochondrion is the exclusive residence of enzymes 
involved in oxidative phosphorylation ; similarly, oxidative detoxification 
takes place exclusively in the endoplasmic reticulum (ER). The proteins 
that compose, and are contained within, particular membrane systems are 
q kept there by the impermeability of the lipid bilayer to diffusion of proteins 

across membranes. How then is compartmentalization of newly syn- 
I o thesized proteins achieved, in view of the fact that the cytosol is the 

I i common site of synthesis for the majority of proteins, though they are 

I I destined for distinct subcellular locations? The term intracellular protein 
I g topogenesis has been coined (Blobel 1980) to describe the specialized 
o ^ mechanisms by which newly synthesized proteins selectively overcome the 

permeability barrier of specific intracellular membranes to achieve their 
correct subcellular localization. This review addresses the question of 
how proteins that pass through or reside in the intracistemal space are 
3 o specifically synthesized on membrane-bound ribosomes and translocated 

1 1* into the ER lumen. 

2^ As in the study of other protein translocation events (e.g. across mito- 

^ ^ chondrial membranes) there are two fundamental issues to resolve regard- 

§ I ing transport across the ER membrane : (a) How is the target membrane 

recognized and distinguished from all other membrane systems? (6) Once 
it has been targeted, how is the polypeptide chain translocated across the 
lipid bilayer into the lumen of the organelle? 



13 u 



-3 1 HISTORICAL BACKGROUND 

3 The work of Palade and coworkers on the secretory pathway (reviewed 

< by Palade 1975) focused attention on ribosomes bound to the rough 

endoplasmic reticulum as the site of synthesis of secretory proteins. The 
subsequent demonstration of vectorial discharge of puromycin- released 
polypeptides into the lumen of isolated rough microsomal vesicles 
(Redman & Sabatini 1966) suggested that a specialized mechanism was 
responsible for translocation across the ER membrane: Nascent poly- 
peptides emerged into the lumen of the microsomal vesicles concomitant 
with their synthesis. These results raised the intriguing question of how 
the cell could distinguish the mRNAs for secretory proteins from those 
for cytoplasmic or mitochondrial proteins and selectively translate the 
former on ER-bound ribosomes. 
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The signal hypothesis (Blobel & Dobberstein 1975) was proposed to 
account for these phenomena. Over the last 1 5 years overwhelming evi- 
dence has accumulated from a plethora of experimental systems in favor 
of this model. As it specifically relates to secretory proteins, the essential 
tenets of an updated version of this hypothesis (for a recent review see 
Walter et al 1984) are that: (a) the information for localization of newly 
synthesized proteins into the lumen of the ER is encoded in a discrete 
segment of the nascent polypeptide, the signal sequence ; (b) this signal 
sequence interacts with a series of receptors, some of them cytoplasmic, 
others integral to the ER membrane. Some of these receptors function in 
% . targeting the chain to the ER membrane, others function in its actual 

§ 1 translocation across that membrane. These latter receptors, together with 

§ I associated proteins in the ER membrane, constitute the "translocon," a 

^ 1 postulated engine able to drive signal sequence-bearing chains across the 

I ER membrane through a proteinaceous pore or channel, 

g More recently, the concepts of the signal hypothesis have been expanded 

^ to describe a general framework for intracellular protein topogenesis (Blo- 

J § bel 1980). According to this model, "topogenic sequences" within discrete 

"Sg segments of targeted proteins are decoded by specific receptors, either 

§ g during (cotranslational) or shortly after (posttranslational) their biosyn- 

1 S) thesis. The specificity of such signal sequence-receptor interactions targets 
^ Q the proteins to the correct intracellular membranes where they are fed into 

translocons that move them across the hydrophobic core of the Upid 

§ 2 bilayer. Similarly, it has been proposed that another class of topogenic 

2 J sequences — termed stop- transfer sequences — interacts with the translocon 
§3 to arrest further transport and thereby achieve an asymmetric trans- 
;3 o membrane orientation of integral membrane proteins. Thus many of the 
^ concepts developed in this review for soluble ectoplasmic proteins are 
3-^ directly applicable to the problem of integration of transmembrane 
i ^ proteins. Recent developments reviewed below suggest that translocons in 
-i different intracellular membrane systems may function more similarly than 
J previously thought. 



MECHANISM OF TARGETING 



With the availability of in vitro systems that faithfully reproduce the 
translocation of nascent proteins [secretory proteins (Blobel & Dobb- 
erstein 1975), lysosomal proteins (Erickson et al 1983), and certain classes 
of integral membrane proteins (Katz et al 1977)], it became feasible to 
investigate the molecular requirements for protein translocation across the 
ER membrane. So far, two components, the signal recognition particle 
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(SRP) and the SRP receptor, have been purified and shown to function in 
the targeting events preceding the actual translocation event. 

Signal Recognition Particle 

SRP is an 11 S small cytoplasmic ribonucleoprotein (Walter & Blobel 
1982). In our current view, SRP functions as an adapter between the 
protein synthetic machinery in the cytoplasm and the protein translocation 
machinery in the ER membrane. 



« STRUCTURE OF SRP SRP was first recognized by its ability to restore the 

^•l" translocation activity of salt-extracted microsomes in vitro (Warren & 

o Dobberstein 1978). It was purified to homogeneity from a salt extract of 

^ I (Walter & Blobel 1 980). SRP consists of a small (300 nucleotide) 7SL RNA 

I o, (Walter & Blobel 1982) and six nonidentical polypeptide chains organized 

g*i£ into four SRP proteins. These proteins are two monomers, a 19-kDa 



canine pancreatic microsomal vesicles using this activity as an assay 



i§ polypeptide and a 54-kDa polypeptide, and two heterodimers, one com- 

posed of a 9-k:Da and a 14-kDa polypeptide, and the other comprised of 
1^ a 68-kDa and a 72-kDa polypeptide (Siegel & Walter 1985). When SRP 

Co is disassembled under nondenaturing conditions, the RNA and the protein 

fractions are inactive by themselves, but together they can readily be 
vo § reconstituted into an active particle (Walter & Blobel 1983; Siegel & 

S ; Walter 1985). 

I Recent studies revealed that different assayable functions of SRP in the 

^[S targeting process can be assigned to specific structural domains of the 

particle. These separable functions include the recognition of signal 
g sequences and the ability of SRP to arrest specifically the translation of 

= 1 nascent signal sequence-bearing proteins (Siegel & Walter 1986b). These 

^ c domains are schematically indicated in Figure 1 superimposed on the 

(i secondary structure of 7SL RNA. This model is supported by recent 

g evidence demonstrating that SRP is a rod-shaped, elongated structure 

< (Andrews et al 1985) and that the RNAs — visualized directly by electron 

spectroscopic imaging — span the entire length of the particle (D. W. 

Andrews et al, submitted for publication). 

SIGNAL RECOGNITION Oncc SRP had been purified to homogeneity it 
became possible to study its activity in greater detail. Results of exper- 
iments testing both the effects of SRP on the translation of secretory 
proteins and its binding properties with various components in the trans- 
lation-translocation system have led to the model of the SRP cycle shown 
in Figure 2. 

In brief, SRP is thought to bind in a signal-sequence-independent 
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manner with relatively low affinity to biosynthetically inactive ribosomes 
(Figure 2a, b) (Walter et al 1981). Upon emergence of a signal sequence as 
part of the nascent polypeptide chain, the affinity of SRP for the ribosome 
increases (Figure 2c); in the case of preprolactin synthesized on wheat 
germ ribosomes this increase amounts to three to four orders of magnitude. 
The SRF-ribosome-nascent chain complex is then targeted to the mem- 
brane of the ER via a direct interaction of SRP with the SRP receptor 
(Walter & Blobel 1981b), an integral membrane protein that is restricted 
in its subcellular localization to this membrane system (Hortsch et al 1985). 
At this point SRP and the SRP receptor detach from the ribosome and 
g . can reenter the cycle, i.e. both molecules are thought to act catalytically 

g 1 in the targeting process. The ribosome-nascent chain complex engages in 

■| I a functional ribosome membrane junction, and the translocation of the 

1 1 nascent polypeptide proceeds (see below). (For a more detailed description 

g of the SRP cycle see Walter et al 1984.) 

is ^ ELONGATION ARREST When SRP is included in in vitro translation systems 

^§ in the absence of microsomal membranes, it blocks protein synthesis 

"S § concomitant with the increase in its affinity for the ribosome just after 

J g the signal peptide becomes exposed outside the large ribosomal subunit 

1 S) (Walter & Blobel 1981b; Meyer et al 1982a). In some cases a discretely 

sized protein' fragment that corresponds to the elongation-arrested 
^ J secretory protein can be detected by gel electrophoresis ; in other cases the 

§ .2 arrested forms appear as a broader smear on gels, which indicates that 

2 J SRP can recognize signal sequences and arrest elongation within a certain 

oo ^ 



GO 



2: ;3 range of chain lengths. It is also observed that some nascent polypeptides 

;§ o are arrested, while others transiently pause in chain growth (P. Walter, 

^ unpublished results). Therefore, in these latter cases arrest is often difficult 

to detect (Meyer 1985). Interestingly, while elongation arrest has been 
t ^ demonstrated as a kinetic delay of elongation in translation systems recon- 

stituted from mammalian components (K. Matlack & P. Walter, unpub- 
^ lished results), the same eflFect is more pronounced (as a strict blockage of 

elongation) when signal-bearing proteins are translated in a heterologous 
wheat germ system. Thus while the general phenomenon of arrested elong- 
ation is ubiquitous, different in vitro systems reflect it to a different degree. 
Therefore it remains to be established whether SRP acts in vivo as a strict 
"on-off' switch or functions as a more graded rate-controlling factor. 

Two distinct biochemical approaches were employed to map the elon- 
gation-arrest function to a separate and separable domain of SRP. One 
functional domain was shown to consist of the 9/14-kDa SRP proteins and 
those 7SL RNA sequences that are homologous to repetitive Alu DNA 
(see Figure 1, left). One experimental approach employed single omission 
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experiments in which SRPs were reconstituted from fractionated and 
purified protein and RNA components (Siegel & Walter 1985). A second 
approach involved the preparation of a subparticle obtained after nucle- 
olytic dissection of SRP (Siegel & Walter 1986). These perturbed SRPs 
lacking the elongation-arrest domain are still active in signal recognition 
and targeting; therefore, elongation arrest cannot be a prerequisite foi 
protein translocation across the membrane. In the absence of elongation 
arrest, however, most signal-bearing nascent proteins lose their ability tc 



o 




(b) 
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be translocated if elongation proceeds beyond a critical point in the absence 
of membranes. Thus elongation arrest seems to maintain the nascent chain 
in a translocation-competent state by preventing (or delaying) its further 
elongation into the cytoplasmic space and thereby adds to the fidelity of the 
reaction. The particular length range in which a nascent protein remains 
translocation competent may vary for different proteins (see below). 

Since SRP contains an RNA as a structural component, it is tempting 
to speculate that this RNA engages in base-pairing interactions with other 
nucleic acids during the SRP*s functional cycle. The RNA components in 
op the translational apparatus are likely candidates for participants in such 

S interactions (Walter & Blobel 1982; Zwieb 1985). However, there is at 

" ^ present no direct evidence for such interactions. A possible mechanism for 

elongation arrest could involve the binding of 7SL RNA to the A-site on 
the ribosome, thus preventing the next amino acyl tRNA from binding. 
I I Indeed, the secondary structure of 7SL RNA in the elongation-arrest 
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o 



o o 



I g Figure J Domain structure of SRP (left) and the SRP receptor (right), (a) (From Siegel & 



^ Walter 1986a): SRP is composed of two separable domains. A possible phylogenetically 

conserved secondary structure for 7SL RNA is shown (Siegel & Walter 1986a). Similar 
M o secondary structures have been proposed by Gundelfinger et al (1984), E. UUu (personal 

> S) communication), and Zwieb (1985). Connecting lines between the RNA strands indicate 

Q i5 base pairs ; G-U pairs are included. (For an extensive description of SRP structure see Siegel 

^ g & Walter 1986b.) Micrococcal nuclease cleaves the particle at the point indicated by arrows, 

5! • removing the elongation-arresting domain. Additional cuts mapped by Gundelfinger et al 

^ I (1983) are indicated by arrowheads. The elongation-arresting domain includes both ends of 

2 the RNA (labeled 5' and 30 and is comprised of sequences that are homologous to the 

^ 3 repetitive Alu DNA sequence family. Evolutionary considerations suggest that 7SL RNA is 

^ the parent molecule for repetitive Alu DNA (Ullu & Tschudi 1985). The thin dashed lines 

5 .-fc" indicate the boundaries of homology between 7SL RNA and an Alu consensus sequence. 

^ S The elongation-arresting domain also contains the 9/14-kDa SRP protein. The other domain, 

V 'c termed SRP(S), retains signal recognition and translocation promoting function and is 

^ ^ comprised of the middle portion of 7SL RNA (the S-segment) and the remaining three SRP 

^ proteins. As mentioned in the text, the 54-kDa SRP protein can be selectively cross-linked 

c to signal peptides and may therefore provide the signal binding pocket, {b) (From Lauffer 

^ et al 1985): A model of the disposition of the SRP receptor a-subunit in the membrane of 

the ER is shown. Putative structural and functional features as deduced from the primary 
sequence (Lauffer et al 1985) are indicated. Regions I and II are putative membrane-spanning 
regions; whether both of them or either one alone functions as the membrane anchor of the 
receptor or if additional hydrophobic regions are contributed by the p-subun'it is presently 
not known. Regions III-V contain the charge clusters described in the text. The boxed 
domain contains regions strongly resembling RNA binding proteins ; their presence suggests 
that the SRP-SRP receptor interaction may include binding of 7SL RNA to this domain. 
The arrow indicates the position of the protease-sensitive site. Cleavage of the receptor at 
this position results in the release of the 52-kDa cytoplasmic fragment. This fragment does 
not have two properties of the intact receptor : the binding affinity for SRP and the ability 
to release elongation arrest (Lauffer et al 1985 ; Gilmore et al 1982a). 
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domain of SRP resembles that of a tRNA that is missing the anticodon 
stem. In addition, the physical dimensions of SRP would easily allow the 
particle to bridge the distance between the nascent chain exit site on the 
ribosome (where the signal sequence emerges) and the peptidyl transferase 
activity known to be located between the two ribosomal subunits (Andrews 
et al 1985). 

Signal Sequences 

What constitutes the essential features of a signal sequence and how 
such sequences are recognized by SRP remain unsolved problems. Signal 
sequences show no recognizable primary sequence homology, and a recent 
I compilation shows that sequence variation can be rather extreme (von 

o Heijne 1985). Yet studies on a variety of systems both in vivo and in vitro 

demonstrate conservation of signal sequence function over the widest 



o 



> c 
o 



1 ^ 

S § evolutionary distances (MuUer et al 1982). As a consequence we are still 

E i, not able to predict with confidence which regions in proteins might function 

o as internal signal sequences. Nevertheless, internal signal sequences have 

s ^ been demonstrated unequivocally (Bos et al 1984). Moreover, cleavage by 

•o - One of the few characteristic features of signal sequences is a variable 

° § 

c o 

> top 



o 



signal peptidase is not required for translocation (Palmiter et al 1978). 



stretch of hydrophobic amino acids in the core of the sequence. Point 
o mutations in the hydrophobic core in bacterial signal sequences have been 

so g shown to aboHsh function (Lee & Beckwith 1986, this volume). Based on 

S;*^ the hydrophobicity of these regions and on evidence from biophysical 

? I studies with synthetic signal peptides (reviewed by Briggs & Gierasch 

1986), it has been suggested that these sequences act as amphiphiles that 
are integrated into and possibly perturb lipid bilayers. There is, however, 
J ^ still no evidence that the general mechanism for translocation involves a 

direct interaction of signal sequences with the hydrophobic core of the 
c lipid bilayer. Indeed, several lines of evidence suggest direct interactions 

(S >^ of signal sequences with proteins. 

c The clearest evidence for such interactions involve SRP, Since SRP is a 

< soluble ribonucleoprotein, its interactions with signal sequences can be 

studied in the absence of membranes by measuring binding or by observing 
the SRP-mediated modulation of protein synthesis. For example, when 
signal sequences that are rich in leucine are translated in the presence 
of the amino acid analog ^-hydroxy-leucine, SRP signal recognition is 
abolished (Walter et all981 ; Walter & Blobel 1981b). This demonstrates 
that SRP directly recognizes features in the nascent chain. Moreover, the 
finding conclusively rules out the possibility that sequences in the mRNA 
alone are responsible for the observed effect. (After the discovery of an 
RNA component in SRP the latter notion was considered attractive 
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because of the possibility of recognition via putative base-pairing inter- 
actions.) Direct proof of an SRP-signal sequence interaction was recently 
provided by cross-linking experiments. Two groups independently showed 
that a photoactivable cross-linking reagent was selectively incorporated 
into the amino-terminal region of the signal peptide for nascent prepro- 
laclin. Each group found that the signal peptide is in direct contact with 
the 54-kDa SRP protein (Kurzchalia et al 1986 ; Krieg et al 1986). 

SRP Receptor 

Using the same in vitro protein translocation assays that led to the puri- 
fication of SRP, two distinct approaches were taken to identify the cor- 
responding membrane components involved in targeting of signal 
I g sequence-bearing nascent chains to the ER membrane. These approaches 

eventually led to the discovery and purification of the SRP receptor, the 
first membrane protein proven to play a vital role in this process. 



S3 « 

S « 
tA o 

E One of these approaches was based on the early observation that pro- 



teolysis of microsomal membranes completely abolishes their protein 
eg translocation activity but that, most importantly, the activity can be 

restored by addition to an extract prepared by limited proteolysis of the 
original microsomal membrane fraction (Walter et al 1979; Meyer & 
Dobberstein 1980a). This proteolytic dissection and functional recon- 
J* stitution provided the assay for the purification of the protease-solubilized 

^ g component. The activity was purified as a basic 52-kDa protein (apparent 

mobility on SDS PAGE is 60 kDa) (Meyer & Dobberstein 1980b), which 



•Bo 



^ E was subsequently demonstrated (by immunological techniques) to be a 

On " 
2 CJ 



proteolytic fragment derived from a 69-kDa integral membrane protein 
(apparent mobility 72 kDa) restricted in its subcellular localization to the 
S ^ endoplasmic reticulum (Meyer et al 1982b). 

= o The second approach took advantage of the observations that, when 

^'M. assayed in the absence of microsomal membranes, SRP causes a site- 

^ specific elongation arrest in the synthesis of presecretory proteins and that 

I microsomal membranes contain an activity that releases the elongation 

^ arrest. Based on these observations, the elongation-arrest-releasing activity 

was predicted to reside in a membrane protein termed the SRP receptor 
(Walter & Blobel 1 98 1 b) [subsequently named the docking protein (Meyer 
et al 1982a)]. Fractionation of a detergent extract of microsomal mem- 
branes employing affinity chromatography on SRP-Sepharose as a key step 
allowed purification of the SRP receptor. The purified fraction contained a 
predominant 69-kDa membrane protein and the arrest-releasing activity. 
Using both immunological and peptide-mapping techniques, the SRP 
receptor was shown to be identical to the membrane protein identified via 
the proteolytic dissection methods described above (Gilmore et al 1982a,b). 
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Recently, the primary structure of the 69-kDa SRP receptor protein was 
determined from its cognate cloned cDNA, and its relationship to the 
cytoplasmic SRP receptor fragment was determined (Lauffer et al 1985). 
This fragment was shown to begin with residue 152 of the intact protein. 
Thus, it is sequences within the 151 amino acids at the amino terminal that 
anchor the SRP receptor in the lipid bilayer. Two distinctly hydrophobic 
regions have been identified that constitute putative a-hehcal trans- 
membrane segments. Since either of these segments would position a 
positively charged amino acid in the hydrophobic core of the lipid bilayer, 
the receptor probably interacts with other integral membrane proteins that 

1 . neutralize these charges. Recent evidence suggests the existence of proteins 
1 1 that can be copurified with the 69-kDa SRP receptor protein or isolated 
II by affinity techniques. In particular, an ER membrane protein with an 
^ g apparent molecular weight of 30 kDa was found by a variety of techniques 
■| I to be tightly associated with the 69-kDa protein (Tajima et al 1986). Thus 
g the SRP receptor appears to be a hetcro-dimeric protein that in addition 

to the 69-kDa polyi>eptide (the SRP receptor a-subunit) contains a second 
30-kDa subunit ()?-subunit). Carboxy-terminal to the putative trans- 

;S § membrane regions in the a-subunit is an unusually hydrophilic domain. 

o § In particular, unusually large clusters of charged amino acids are found 

1 1) surrounding the site of proteolytic cleavage that severs the 52-kDa cyto- 

plasmic domain (see Figure 1, right). This domain of the SRP receptor 
^ strongly resembles nucleic acid binding proteins, which suggests that the 

§ 2 receptor may transiently interact directly with the 7SL RNA in SRP and 

2 that the SRP-SRP receptor affinity could be mediated, at least in part, by 
§5 a protein-nucleic acid interaction. 

;§ The SRP receptor is unlikely to be part of the translocon itself, because 

5 g the receptor is present in the ER membrane in substoichiometric amounts 

cj I with respect to membrane-bound ribosomes. Thus it was suggested that 

the SRP receptor functions "catalytically" and is recycled once correct 
targeting of the ribosome has been achieved (Gilmore & Blobel 1983). 
< There is also evidence for an additional activity that is distinct from SRP 

and the SRP receptor and may interact with the targeted signal sequence 
and act as a secondary signal receptor(s) in the ER membrane (Gilmore 
& Blobel 1985 ; Prehn et al 1980). However, a protein serving this function 
has not yet been identified. 

MECHANISM OF TRANSLOCATION 

Machinery 

Cell-free systems provided a detailed molecular description of the targeting 
machinery, but have yet to allow insights into the molecular details of the 
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translocation process. In part this difficulty results from the apparent 
obligate coupling of translocation and translation : Transport across the 
ER membrane takes place cotranslationally ; completed precursors are not 
detectable in vivo in the cytoplasm. In cell-free systems translocation 
proceeds only during a limited time and under the fastidious conditions 
required for the synthesis of the very molecule whose translocation is being 
studied. As a result, although several specific polypeptides have been 
implicated as functional components of the translocon, the direct role of 
any of these proteins remains to be demonstrated. For example, two 
integral membrane proteins, termed ribophorins, have been suggested to 
g . act as ribosome receptors (Kreibich et al 1978) ; the recent purification of 
> 1 signal peptidase, a relatively abundant complex of six polypeptides, sug- 

I I gests that these proteins are involved in other functions besides signal 

I I cleavage (Evans et al 1 986). 

Translocation Substrates 

^§ Although we know little about the actual machinery involved, insight 
o into certain aspects of the mechanism of translocation has recently been 

0 g obtained by approaches involving manipulation of the translocation sub- 

1 % strates. For example, expression of engineered cDNAs encoding fusion 
9 ^ proteins in transcription-linked translation systems demonstrated that a 
^ J signal sequence was sufficient to direct translocation of normally cyto- 
§ 2 plasmic globin, both in vitro (Lingappa et al 1984) and in vivo (K. Simon 

2 J et al, submitted for publication). Thus, the specific information for trans- 
§ Q location was contained within the signal sequence and not the **passenger" 
;5o protein. 

5 A more complex version of these experiments raised interesting ques- 

u .> tions as to the mechanism of translocation (Perara & Lingappa 1985). The 
o ^ DNA sequence coding for globin, normally a cytosolic protein, was fused 

3 with the 5' end of the DNA sequence for preprolactin, a secretory protein 
^ that has an amino-terminal signal sequence. This fusion protein thus 

contained the preprolactin signal sequence at an internal position, 1 1 7 
amino acids from the initiator methionine. When expressed in a tran- 
scription-linked translation system, this internal signal sequence was not 
only cleaved by signal peptidase, but directed the translocation of both 
flanking protein domains. Surprisingly, carbonate extraction demon- 
strated that neither the globin domain with the signal sequence attached 
at its carboxy terminus nor the prolactin domain were integrated into the 
membrane. Instead, both resided in the vesicle lumen either free or bound 
to proteins. This result suggests that signal sequences are not buried in the 
bilayer directly but perform their function by interacting with a protein- 
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aceous machinery in the membrane. Moreover, translocation of the 
globin domain by a subsequently emerging signal sequence suggests that 
the energy used for the globin domain's synthesis is not required for its 
translocation. Thus the commonly observed coupling of translocation and 
translation may not be an obligate requirement for transport across the 
ER membrane. 

The notion that the translocation machinery can function independently 
of protein synthesis has now received direct support from different experi- 
mental systems. 

DO 

q 

•t^ Posttranslational Translocation in Yeast 

3 I Recently, in vitro translation-translocation systems from the yeast Sac- 

ll charomyces cerevisiae have been established (Hansen et al 1986; Waters 

I § & Blobel 1986; Rothblatt & Meyer 1986). The precursor to the yeast 

g pheromone a-factor has been used as a model secretory protein. Contrary 
to all expectations, this precursor, an ^ 18.5 kDa protein, is translocated 

^ § across yeast ER membranes posttranslationally, i.e. after it has been com- 

"S § pletely synthesized and has been released from ribosomes. Prepro-a-factor 

g g has no particularly hydrophobic or amphipathic stretches in its primary 

1 a sequence (other than a typical signal sequence), making it unlikely that its 
S posttranslational translocation is due to some passive partitioning of the 

^ J protein across the lipid bilayer. Furthermore, the posttranslational trans- 
it .2 location reaction is ATP-dependent and requires protein elements both in 

2 J the membrane and the soluble fraction. Whether these protein components 
g Q are related in any way to the putative yeast SRP and SRP receptor analogs 

remains to be established by biochemical analysis. It is clear from these 

^ data, however, that translocation of prepro-a-factor does not require 

g > coupling to protein synthesis. Therefore, the translocon can, in principle, 

>D accept its substrate posttranslationally and in the absence of the ribo- 

c 



some. 

c It should be kept in mind that the posttranslational translocation of 

prepro-a-factor was observed in vitro in a system artificially depleted of 
ER membranes during synthesis. This finding does not prove that prepro- 
a-factor ever crosses the ER membrane posttranslationally in vivo, where 
ER membranes are always present during translation. Rather, the actual 
degree of coupling of translocation and protein synthesis will depend 
on the relative rates of the respective processes. If targeting and trans- 
location are fast with respect to protein elongation, a strictly vectorial 
cotranslational translocation mode will result, as appears to be the 
rule in mammalian cells in vivo (Bergman & Kuehl 1979; Glabe et al 
1980). 
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Posttranslational Translocation of Genetically 
Engineered Substrates 

Similar findings also emerged from the use of engineered clones in mam- 
malian cell-free translation systems (Perara et al 1986 ; Mueckler & Lodish 
1986). Using a procedure that generates a truncated mRNA lacking a 
termination codon, secretory polypeptide chains could be synthesized and 
presented to membranes in the absence of further chain elongation while 
still held by the ribosome that effects their synthesis. It was demonstrated 
that such chains could be translocated and that nucleotide triphosphates 
were required as the energy source for this process. In contrast to the 
situation in the yeast system described above, in most of these cases 
H 5 translocation could be abolished by releasing the nascent chain from 

I = the ribosome by artificial termination with the amino acyl tRNA analog 

g puromycin. As expected, translocation was abolished by deletion of the 



I ^ codmg region for the signal sequence. In some cases, however, it was also 

|. o found that some short chains could translocate in a ribosome-independent 

I ^ condition analogous to that found for prepro-a-factor in the yeast system 

^ (E. Perara & V. R. Lingappa, submitted for publication). Thus it appears 

•o ^ that, at least for the proteins investigated, polypeptide chain growth pro- 

•i % ceeds through stages in which translocation competence is a property of 

O H the chain itself or is maintained by interaction with the ribosome (see 



Q5 

vo c Figure 3). 

5! These results show cotranslational translocation in a new light : The role 

? 1 of the membrane-bound ribosome is not to extrude or push the chain 

vd ^ 
2- u 



through the bilayer as suggested by some observers (Wickner & Lodish 
1985). Rather, translocation is catalyzed by an energy-consuming protein 
(5 ^ engine in the ER membrane, and the ribosome acts, in most but not all 

= i cases, as a ligand that maintains the translocation competence of the 

^ 5 nascent chain. 

1 CONCEPTS AND CONTROVERSIES 

We have surveyed the development of ideas on the problem of trans- 
location of newly synthesized proteins across the ER membrane. Initially, 
attention was focused on the coupling of translocation to translation, a 
feature unique to translocation across the ER membrane. This has given 
way to the realization that obligate coupling to translation is not a pre- 
requisite for translocation and that transport across membranes of a 
variety of organelles may share common features. These include the 
involvement of a targeting receptor to discriminate among proteins 
intended for different destinations, a translocon that somehow transports 
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the targeted protein across the bilayer, and a requirement for energy 
(derived from hydrolysis of nucleoside triphosphates or from an electro- 
chemical gradient) to drive translocation. The recognition of these steps 
has resulted from the study of diverse proteins in a variety of organisms 
and from the study of "artifacts" generated in vitro, i.e, biochemically or 
genetically altered translocation machinery (Siegel & Walter 1986b) and 
substrates (Perara & Lingappa 1985), whose aberrant behavior has pro- 
vided insight into fundamental details of the targeting and translocation 
problem. Even as new questions emerge, many old ones (e.g. the molecular 
go nature of the signal sequence-receptor interaction) remain unanswered. 

5 Other questions must now be reformulated. For example, in spite of the 

H recent demonstration that the translocon in the ER membranes can, in 



> c 

p o 



principle, accept translocation substrates posttranslationally, transloca- 
I ^ tion most likely occurs cotranslationally in vivo. The observation that 

J52 1 most posttranslational translocation across the ER membrane appears 

I H. to be ribosome dependent in vitro supports this notion. As described 

earlier, ribosome-independent and ribosome-dependent modes of post- 
£ g translational translocation across the ER membrane probably reflect the 

requirements for maintenance of the "translocation competent state'* of 
the nascent chain (see Figure 3). Loss of translocation competence may 
S be due to folding (aberrant or normal) or oligomerization of the protein, 

^ g or entanglement of the signal sequence with the rest of the chain such that 

^ § the resulting structure can no longer functionally interact with either the 

S "2 targeting or translocation machinery. A few proteins (such as yeast prepro- 

5 E a-factor) retain translocation competence even as free, completed poly- 



p 

S O 



O ° 



c 
c 

< 



peptides. For most proteins, however, translocation competence is re- 
stricted to a generally narrow range of chain lengths. This range can be 
a extended if the polypeptide is targeted to the membrane while still attached 

=58 to the ribosome. However, eventually most proteins reach a point in chain 

> ;§ elongation where translocation competence is no longer maintained, even 

«i ^ when the protein is associated with the ribosome. One of the roles of the 

g SRP-induced elongation arrest may therefore be to extend the effective 

range of translocation competence for the nascent polypeptide chains. 

Previously, the nascent chain was thought to be vectorially translocated 
across the membrane as it emerged from the ribosome ; the finding of 
posttranslational translocation raises the possibility that the translocon 
may be sufficiently pliable to accept (partially) folded domains rather than 
exclusively linear polypeptide chains. Alternatively, the translocon may 
effect unfolding of such domains prior to translocation. In either case the 
molecular environment traversed by the protein as it passes through the 
bilayer remains to be investigated. The finding that translocation is driven 
by nucleoside triphosphate hydrolysis is a direct demonstration of a protein 
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Figure 3 Ribosome dependence of translocation competence. This figure depicts the natural 
history of the relationship of chain growth {A) to translocation competence (Q. The ribosonae 
dependence of posttranslational translocation was assayed for various lengths of polypeptide 
synthesized. Progressively shorter polypeptides were synthesized by translating mRNA tran- 
scripts in vitro that were progressively truncated at their 3' end and therefore lacked ter- 
mination codons (Perara et al 1986 ; E. Perara & V. R. Lingappa, manuscript in preparation). 
Ribosomes that have reached the 3' end of such a truncated mRNA appear unable to release 
the newly synthesized polypeptide. Release can be artificially achieved by treatment with 
puromycin. Such translocation substrates, either with or without release from the ribosomes 
(as indicated in B\ can be assayed for translocation competence upon presentation to a 
microsomal membrane preparation in the presence of nucleoside triphosphate to supply 
energy. In this assay the ribosome dependence or independence of the translocation com- 
petence is reflected in the ability or inability of puromycin pretreatment to abolish trans- 
location by releasing the chain from the ribosome (see right arms of branched arrows). {A) 
depicts three ribosomes on a polysome al various stages (I, II, and III) during the synthesis 
of a hypothetical secretory polypeptide chain. In (Q translocatin competence as assayed 
posttranslationally (see above) is indicated ( + ). At stage I, the nascent chain is translocation 
competent, and this comi>etence is independent of the presence of the ribosome, as experi- 
mentally demonstrated. As chain growth proceeds, the polypeptide enters stage II where its 
translocation competence requires the ribosome. Finally, late in chain growth (stage III) the 
chain is no longer competent to interact with receptors and other proteins involved in 
translocation. Whether loss of translocation competence in stage III involves a loss of 
targeting function or loss of a productive interaction with the translocon remains to be 
determined. It is not known whether SRP is required for posttranslational translocation in 
either case. 
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engine in the membrane and rules out a spontaneous process previously 
suggested (Wickner 1979; Engelman & Steitz 1980). It remains to be 
established how the energy of hydrolysis is used by the translocon. 

Old controversies regarding co- versus posttranslational translocation 
appear to be resolved. In retrospect it could be concluded that many 
prokaryotic proteins (targeted to the plasma membrane) do not require 
ribosomes to maintain their translocation competence. This also appears 
to be the case for all proteins (so far studied) that are translocated across 
the peroxisomal membrane and the mitochondrial and chloroplast en- 
velopes. The most challenging problems for future research now include 
the further fractionation and purification of all the essential, as well as 
modulatory, components of the targeting and translocation machinery. 
This should ultimately allow their reconstitution in in vitro systems for 
the mechanistic analysis of their functions. Finally, our goal must be the 
understanding of how these components function in vivo. This should 
include elucidation of the regulatory or homeostatic mechanisms involved 
in harnessing such a remarkable set of protein machines as the translocons. 
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Factor 6 is a serine protease, which despite its tryp- 
sin-like specificity has Asn instead of the typical Asp at 
the bottom of the pocket (position 189, chymo- 
trypsinogen numbering). Asp residues are present at 
positions 187 and 226 and either one could conceivably 
provide the negative charge for binding the Pj-Arg of 
the substrate. Determination of the crystal structure of 
the factor B serine protease domain has revealed that 
the side chain of Asp^^® is within the pocket, whereas 
Asp*^^ is located outside the pocket. To investigate the 
possible role of these atypical structural features in sub- 
strate binding and catalysis, we constructed a panel of 
mutants of these residues. Replacement of Asp^^*^ caused 
moderate (50-60%) decrease in hemolytic activity, com- 
pared with wild type factor B, whereas replacement of 
Asn*®^ resulted in more profound reductions (71-95%). 
Substitutions at these two positions did not signifi- 
cantly affect assembly of the alternative pathway C3 
convertase. In contrast, elimination of the negative 
charge from Asp^^® completely abrogated hemolytic ac- 
tivity and also affected formation of the C3 convertase* 
Kinetic analyses of the hydrolysis of a P|-Arg containing 
thioester by selected mutants contirmed that residue 
^p226 jg ^ primary structural determinant for P^-Arg 
binding and catalysis. 



Complement is a major effector system of host defense. Ac- 
tivation of complement leads to the generation of protein frag- 
ments and protein-protein complexes that mediate acute in- 
flammatory responses, phagocytosis and killing of pathogens, 
and regulation of adaptive immune responses. Activation-asso- 
ciated production of biologically active protein fragments is 
catalyzed by a group of eight atypical complement serine pro- 
teases (SPs)^ of the chymotrypsin superfamily (1). Understand- 



* This work was supported by National Institutes of Health Grants 
AI21067 (to J. E. V,), NIAMS, National Institutes of Health Grant P60 
AR20614 R-3 (to Y. X.), and National Institutes of Health Grant 
AI39818 (to S. L. V. N.). The costs of publication of this article were 
defrayed in part by the payment of page charges. This article must 
therefore be hereby marked '^advertisement'" in accordance with 18 
U.S.C. Section 1734 solely to indicate this fact. 

§ To whom correspondence should be addressed: THT, Rm. 437, 1900 
University Blvd., Div. of Clinical Immunology and Rheumatology, Dept. 
of Medicine, Birmingham, AL 35294, Tel.: 205-975-6241; Fax: 205-934- 
2126; E-mail: rheu019@uabdpo.dpo.uab.edu. 

* The abbreviations used are: SP, serine protease; B-SP, the factor B 
serine protease domain; cCOLL, fiddler crab collagenase; CCP, comple- 
ment control protein module; CHO, Chinese hamster ovary; CoVF, 
cobra venom factor; EC3b, erythrocytes sensitized with C3b; hnELA, 
human neutrophil elastase; hPR03, human protease 3; mAb. mono- 



ing the structural basis for the highly restricted proteolytic 
activity of these SPs is an important first step toward pharma- 
cologic control of complement activation (2). 

Members of the chymotrypsin family have very similar 
three-dimensional structures but distinct substrate specifici- 
ties. To a gpreat extent specificity is determined by the side 
chains of the amino acid residues that line up the primary 
substrate specificity pocket (S^ site). The pocket has three walls 
formed by residues 189-195, 214-220, and 225-228 (chymot- 
rypsinogen numbering has been used for all SPs or SP domains 
throughout this paper) (3), The presence at the bottom of the 
pocket of Asp^*^ endows trypsin with preference for positively 
charged Arg and Lys residues (4, 5), whereas in chymotrypsin 
the specificity for bulky aromatics is largely determined by 
(ggj.189 Residues at position 216 and 226 also contribute to 
substrate specificity (7). All complement SPs exhibit trypsin- 
like specificity for positively charged Arg residues and all have 
an Asp at position 189, except for factor B and C2 (Fig. 1). 

Factor B and C2 are structurally similar modular proteins 
that play a central role in complement activation by providing 
the catalytic subunits of two key enzymes, namely the C3/C5 
convertases of the alternative and the classical pathway, re- 
spectively. Complement convertases cleave the same single 
peptide bonds in C3 and C5. In addition to having Asn and Ser, 
respectively, instead of Asp at position 189, factor B and C2 
also lack the highly conserved free N-terminal sequence of SPs. 
In typical SPs, the N-terminal sequence constitutes an essen- 
tial structural element largely responsible for the transition 
from zymogen to active enzyme (8). Full expression of the 
proteolytic activities of factor B and C2 only occurs in the 
context of the complexes, C3bBb(C3b) and C4b2a(C3b), respec- 
tively (9). The SP domain resides in the C-terminal half of Bb 
or C2a and is preceded by a von Willebrand factor type A 
module (VWFA) which is noncovalently associated with C3b or 
C4b, respectively, in a Mg^"*" -dependent manner. These atypi- 
cal structural features of factor B and C2 indicate a novel 
activation mechanism and probably also a distinct substrate 
binding arrangement at the primary specificity pocket. 

In addition to their natural protein substrates C3 and C5, 
factor B and C2 and their fragments Bb and C2a hydrolyze a 
small number of C3- and C5-hke synthetic substrates (11-14). 
Overall, C3-like substrates are considerably more reactive than 
C5-like substrates. However, even toward their best sub- 
strates, the k^g^^/K„^ values of factor B, Bb, C2, and C2a are 



clonal antibody; SBzl, thiobenzyl; VWFA, von Willebrand factor type A 
module; wt, wild type; Z, benzyloxycarbonyl; PAGE, polyacryl amide gel 
electrophoresis. 
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Fig. 1. Alignment of partial amino 
acid sequences of factor B, C2, chy- 
motrypsin, and trypsin. Residues that 
form the walls of the primary specificity 
pocket are shaded. The catalytic triad res- 
idue Ser*®*^ is boxed and marked by an 
asterisk. Arrows indicate residues tar- 
geted for site-directed mutagenesis. Num- 
bers at the top are for residues of the 
chymotrypsinogen sequence and those at 
the bottom are for the factor B sequence. 
CUT, bovine chy mo trypsin; TRP, bovine 
trypsin; HC2, human C2; HFB^ human 
factor B, 
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about 3 orders of magnitude lower than the 7.8 x 10^ 
value measured under the same conditions for the hydrolysis of 
the most reactive thioester by trypsin (14). By comparison, the 
catalytic efficiency ik^^^^K^^) of C3bBb for C3 cleavage was 
reported to be 3.1 X 10^ s~^ (10). No natural serine prote- 
ase inhibitor has been found for factor B or C2 and regulation 
of the proteolytic activity of C3 convertases is effected largely 
through control of the assembly and decay of the bimolecular 
complexes. The structural correlates of the low esterolytic ac- 
tivity and extremely restricted substrate specificity as well as 
the conformational change(s) associated with zymogen activa- 
tion are not understood. Determination of the structure of the 
factor B serine protease domain (B-SP) at 2.1-A resolution has 
revealed the expected chymotrypsin fold but also unique fea- 
tures of surface loops and of the oxyanion hole.^ The backbone 
conformation of the pocket is similar to that of trjrpsin, but 
there are substitutions of functionally important residues. In 
this study we used site-directed mutagenesis to analyze possi- 
ble effects of the factor B-specific residues on the assembly and 
activity of the C3 convertase. The data indicate that Asp^^^ is 
a primary structural determinant of P^-Arg binding and that 
the native conformation of Asp^^^ and Asn^®^ are important 
determinants for C3 cleavage. 

EXPERIMENTAL PROCEDURES 

Construction of Mutant Factor B cDNA — The factor B cDNA clone 
BHL4-1 (15) in the expression vectors pRc/CMV or pcDNA3 (Invitrogen, 
Carlsbad» CA) was used as wild type (wt) template in site-directed 
mutagenesis. Factor B mutant cDNA constructs were obtained by the 
method of Zollar and Smith (16) as modified by Kunkel(17). Alternatively, 
the QuikChange Site -directed mutagenesis kit (Stratagene, La JoUa, CA) 
was used according to the manufacturer's protocol. All cDNA constructs of 
mutant factor B were verified by restriction mapping and dideoxynucle- 
otide sequencing (18) of the region around the mutation. Oligonucleotides 
were synthesized by the phosphoramidite method (19), using a DNA/RNA 
synthesizer (Model 394 Applied Biosystems, Foster City, CA). 

Expression ofwt and Mutant Factor B cDNA — Transient transfection 
of COS cells with 30-40 jug of cDNA was performed by electroporation 
as described (20). Cell culture supernatant containing secreted factor B 
proteins was harvested 72—90 h after transfection. Cell debris was 
removed by centrifugation and the supernatant was stored fi-ozen at 
— 80 ''C in small aliquots. The concentration of recombinant factor B in 
the medium was measured by enzyme-linked immunosorbent assay 
(15), using a rabbit anti -human Bb IgG (50 Mg/nil) as capturing anti- 
body and the mouse anti-Ba monoclonal antibody (mAb) HA4-ID5 (1.5 
/ig/ml) as reporter. The assay was developed with 1:1000 dilution of 
affinity-purified goat anti-mouse IgGl alkaline phosphatase conjugate 
(Southern Biotechnology Associates, Birmingham, AL) and Sigma sub- 
strate 104 (Sigma). Color development was measured at 405 nm. The 
concentration of factor B was calculated from a standard curve con- 
structed using human serum of known factor B concentration. The 
sensitivity of the assay was approximately 1-2 ng/ml and the concen- 
tration of specific protein in the culture medium ranged from 0.3 to 2 
/jLg/ml. 
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To obtain large amounts of recombinant proteins, stable transfection 
of Chinese hamster ovary cells (CHO-Kl, ATCC) was carried out with 
selected mutants by a modification of a previously described method 
(21). CHO-Kl cells were maintained in Ham's F-12 (Cellgro, Hemdon, 
VA) supplemented with 10% heat-inactivated fetal bovine serum (Life 
Technologies, Grand Island, NY), and 2 mM glutamine at 37 *C in a 
humidified, 5% CO2 incubator. Forty micrograms of each CsCl-purified 
plasmid DNA was transfected into 4-6 x 10® CHO-Kl cells by electro- 
poration as described (21). Selection of neomycin -resistant cells was 
started 72 h after transfection with 750 jug of G418 (Cellgro) per ml of 
the above medium. Subcloning of the G418-resistant cells was per- 
formed approximately 7 days after initiating selection by limiting dilu- 
tion of cells at 0.8 cell/well in 96 -well tissue culture plates. Clones were 
allowed to grow in G418-containing medium with 15% heat-inactivated 
fetal bovine serum for 10-12 days before screening for factor B produc- 
tion by enzyme-linked immunosorbent assay. The highest producing wt 
and mutant factor B clones were selected, expanded, and adapted to 
large-scale production by growing in suspension culture for 2 weeks. 
Protein purification was facilitated by culturing cells in ExCell 301 
serum-free medium (JRH Bioscience, Lenexa, KS) supplemented with 
0.5—2% fetal bovine serum, 2 mM glutamine, and 200 /xg/ml G418. 

Purification of Recombinant wt and Mutant Factor B — One to two 
liters of the stably transfected CHO cell culture medium were har- 
vested, concentrated to approximately 150 ml, and applied to a 30-ml 
column of CM Sephadex C-50 equilibrated with 0.1 m sodium acetate, 
20 mM €-amino-ft-caproic acid, 20 mM EDTA, pH 6.5. Factor B was 
eluted with a gradient of 0—0.2 M NaCl in the starting buffer. For 
further purification, factor B-containing pools were dialyzed against 20 
mM Tris-HCl, pH 8.0, and subjected to fast protein liquid chromatogra- 
phy, using a Mono-Q column (Amersham Pharmacia Biotech). Factor B 
was eluted with a gradient of 0-0.3 m NaCl in the starting buffer. For 
some mutants Mono-Q chromatography was repeated. Purity of factor B 
proteins assessed by 10% SDS-PAGE was between 80 and 95%. 

Reactivity of Factor B Mutants with Module-specific MAbs — Two 
anti-Ba mAbs, HA4-1D5 (a subclone of HA4-1A) and FD3-20, and an 
anti-Bb mAb, HA4— 15, were described previously (22). The mAb 6B3.3 
was raised by using as antigen recombinant factor B VWFA module 
expressed in Escherichia coli. Reactivity of factor B mutants with these 
mAbs was examined by enzyme-linked immunosorbent assay similar to 
that described above. The same rabbit anti-human Bb IgG antibody was 
used in the solid phase, and each of the four mAbs was used as de- 
tectant at a concentration of 1.5 ^xg/ml. The assay was developed with 
goat anti -mouse IgG + IgM alkaline phosphatase conjugate (Jackson 
Immunoresearch Laboratory, Inc., West Grove, PA) and phosphatase 
substrate Sigma 104. Values obtained for each mAb were normalized to 
those measured for HA4-1D5 and represent the average of two sepa- 
rate experiments. 

Solid-phase Cobra Venom Factor (CoVF) Binding Assay — Binding of 
wt and mutant factor B to CoVF was determined by enzyme-linked 
immunosorbent assay as described (23). Culture medium from trans- 
fected COS cells containing wt or mutant factor B was dialyzed against 
half-strength veronal-bufiered saline (0.5 x veronal -buffered saline, 2.5 
mM sodium 5, 5 -diethyl barbiturate, pH 7.4) containing 5 mM MgClg at 
4 **C overnight. Serial dilutions of factor B in the same buffer were then 
added to microplates coated with CoVF (Quidel, San Diego, CA). Bind- 
ing of factor B to CoVF was allowed to occur in the absence or presence 
of 1.5 p,^fm\ factor D at 37 "C for 2 h. Bound factor B or Bb were detected 
with rabbit anti-Bb IgG (50 ^^g/ml) and goat anti-rabbit IgG alkaline 
phosphatase conjugate. Results represent the average values of two 
separate experiments. 

CoVF-mediated Factor B Cleavage by Factor /)— COS cells (4-6 X 
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10^) were transiently transfected by electroporation with wt or mutant 
factor B cDNA as described above. The cells were metabolically labeled 
72 h later in 1 ml of Dulbecco's modified Eagle's medium without 
methionine, supplemented with 250 /xCi of [^^S]Met (specific activity — 
1000 Ci/mmol, Amersham Pharmacia Biotech or ICN Radiochemical, 
Irvine, CA.) for 30 min and chased with cold methionine in Dulbecco's 
modified Eagle*s medium supplemented with 10% heat-inactivated fe- 
tal bovine serum. After a 3-h chase, 650-^1 aliquots of the culture 
supematants were collected, supplemented with 25 mM Tris-HCl, pH 
7.4, 2.5 mM MgClg, and incubated for 2 h at 37 *C with factor D (300 and 

2 ng) in the absence or presence of 5 /xg of CoVF. Labeled factor B and 
Bb were immunopreci pita ted by using rabbit anti-Bb IgG antibody and 
Staphylococcus aureus protein A and analyzed by SDS-PAGE as de- 
scribed (24). To assess factor B cleavage, gel slices corresponding to the 
autoradiographed bands and blank spaces were cut and digested with 
15% H2O2 at 56 *C overnight. The blank gel cuts were used to subtract 
background radioactivity. The released radioactivity was measured 
with Bio Safe II scintillation fluid (RPI, Mount Prospect, ID in an LKB 
liquid scintillation counter (Model 1215 LKB, Gaithersburg, MD) (25). 

Factor B Hemolytic Assay — Sheep blood erythrocytes carrying C3b 
(EC3b) were prepared as described (22), by using freshly purified hu- 
man factor B (22), factor D (26), and C3 (27). Serial dilutions of culture 
medium containing wt or mutant factor B were added to 7.5 x 10^ 
EC3b, 12.5 ng of factor D, and 125 ng of properdin (Sigma) in a total 
volume of 150 ^il in 0.5 X veronal-buffered saline containing 2.5% 
dextrose, 2.5 mM MgCl^, 10 mM EGTA, and 0.1% gelatin. Formation of 
C3 convertase, C3bBb(P), was earned out at 30 "C for 30 min. Then, 0.5 
ml of guinea pig serum diluted 1:40 with 10 mM EDTA in veronal- 
buffered saline was added as source of C3 to C9 and the reaction 
mixture was incubated for 1 h at 37 "C. Percent lysis and hemolytic 
units/^g were calculated as described (28). Values of specific hemolytic 
activity of each mutant were normalized to that of wt factor B and 
represent the mean ± S.E. of at least three independent determina- 
tions, each performed in duplicate. 

C3 Cleavage Assay — 03 was freshly isolated fi:*om plasma of a normal 
individual as described (27) except that a final chromatographic step 
using hydroxyapatite fast protein liquid chromatography (Amersham 
Pharmacia Biotech) was added. Purified wt or mutant factor B (50 ng) 
was mixed with C3 (75 ng) with or without 150 ng of CoVF and 12.5 ng 
of factor D in a total volume of 25 fil of 25 mM Tris-HCl, pH 7.4, 
containing 75 mM NaCl and 5 dim MgCl^. After incubating at 37 'C for 
1 h, 10 ^1 of each reaction mixture was analyzed on 7.5% SDS-PAGE. 

03 and 03 fragments were detected on Western blots by using goat 
anti-human 03 IgG (Oappel, Durham, NO) and affinity-purified rabbit 
anti-goat IgG F(ab)'2 horseradish peroxidase conjugate (ION). The EOL 
luminescent detection system (Amersham Pharmacia Biotech) was uti- 
lized to visualize 03 polypeptide chains following the manufacturer's 
protocol. The amount of 03 conversion was determined by scanning a 
and a' chain using ScanMaker 5 scanner (MicroTck Lab, Inc., Redondo 
Beach, OA) and band intensity was quantified using soft.ware 
NIHimagel.58. 

Esterolytic Assays — The rate of hydrolysis of Z-Lys-Arg-SBzl (Penin- 
sula Laboratories Belmont, OA) was measured by a modification of the 
method of Kam et al. (14). Assays were carried out in microplate wells. 
The B-SP was expressed by Sf9 insect cells infected by recombinant 
baculovirus and isolated from the se3rum-free Excell 401 media using 
Bio-Rex 70 and Mono S ion exchange chromatography.^ The recombi- 
nant B-SP consists of a vector-derived tripeptide Ala-Asp-Pro at the N 
terminus and the 0-terminal 295 amino acid residues of factor B. 
Purified factor B or B-SP (0.11-0.2 /xm) was added to 0.08 to 0.8 mM 
Z-Lys-Arg-SBzl and 1.6 mM Ellman's reagent 5,5-dithiobis-(2-nitroben- 
zonic acid) (Sigma) in 250 ju,l of 0.1 M HEPES, pH 7.5, containing 0.5 M 
NaOl and 16% MoaSO. Factor B was omitted from control wells used for 
measuring background hydrolysis of the substrate. Esterolytic rates 
were measured kinetically for 15 min by using a V^^^ kinetic microplate 
reader (Molecular Devices, Menlo Park, OA). Kinetic constants were 
determined by the Lineweaver-Burk method based on at least five 
substrate concentrations. Correlation coefficients in all cases were 
greater than 0.98. 

RESULTS 

To understand the structural implications of the unique fac- 
tor B residues in and around the primary specificity pocket, the 
serine protease domain (B-SP) was expressed using a baculo- 
virus system and its crystal structure determined at 2.1-A 
resolution by multiple isomorphous and molecular replacement 
methods.''^ As expected, B-SP v/as found to display a chymo- 



trypsin-like, two ^-bar^el structural fold. In the active center, 
the catal3l;ic triad residues, Asp^^^, His^"^, and Ser^®^, and the 
nonspecific substrate-binding site (Ser-Trp-Gly^^^~^^®) have 
t3T)ical serine protease configurations (Fig. 2). However, the 
oxyanion hole displays a zynmogen-like conformation due to the 
inward orientation of the carbonyl oxygen atom of Arg^®^, the 
backbone of which together with those of Cys^®^, Gly^®^, and 
Asp^^"* form a single-turn 3^0 helix. The three walls of the 
primary specificity pocket are formed by residues 189—195, 
214-220, and 225-228. The backbones of these residues, except 
for the single-turn helix, can be superposed on those of the 
corresponding residues of trypsin. Asn^**^ is located at the bot- 
tom of the pocket, replacing the highly conserved Asp of other 
SPs with trypsin-like substrate specificity. However, the side 
chain of Asp^^*^, which replaces Gly^^^ of trypsin, extends to- 
ward the bottom of the pocket which suggests that it may be 
directly involved in binding the Pi-Arg of the substrate substi- 
tuting for Asp^^® of other trypsin-like SPs. An Asp residue also 
replaces a conserved Gly of other SPs at position 187. Asp^®^ of 
factor B is located directly beneath the pocket and forms a salt 
bridge with Lys^®^. To investigate the possible participation of 
the three residues, Asp^^*^, Asn^®®, and Asp^^®, in substrate 
binding and catalysis, factor B mutants at these positions were 
constructed and assayed. In addition, the functional role of 
Pro^^^, not found at this position in other SPs, was also as- 
sessed. In most cases, two independent clones for each mutant 
were expressed and analyzed to avoid artifactual results. In all 
cases, results of functional analysis of the two clones of each 
mutant were consistent. This suggested that functional differ- 
ences from the wt resulted from the amino acid substitution at 
the mutation sites. 

Reactivity of Factor B Mutants with Module- specific 
MAhs — To probe for possible effects of the mutations on the 
overall structure of the molecule, we tested the reactivity of the 
mutants with a panel of module-specific mAbs. The anti-Bb 
mAb HA4— 15 (22) has been shown to recognize an epitope on 
the SP domain (data not shown). MAbs FD3-20 (anti-CCPl-3) 
and HA4-1D5 (anti-CCP2) bind to distinct epitopes on the Ba 
fragment (29), while 6B3.3 (yl,K) recognizes an epitope on the 
VWFA module at or near the C3b-binding site (data not 
shown). We did not observe substantial differences in the re- 
activity of the mutants with the four mAbs (data not shown), 
suggesting that all epitopes tested are retained in their native 
conformation. 

Formation of the CoVFB and CoVFBb Complexes — Expres- 
sion of proteolytic activity by the factor B SP domain requires 
binding of factor B to C3b and its proteolytic cleavage by factor 
D. Introducing mutations in the SP domain could alter C3b 
binding and/or susceptibility to factor D cleavage, although 
these functions have been assigned to distal parts of the mol- 
ecule, namely, the CCP and the VWFA modules (1). We exam- 
ined the ability of factor B mutants to form the CoVFB and 
CoVFBb complexes. Choice of CoVF over C3b was dictated by 
the much longer half-life of the complexes, which facilitates 
detection. All mutants showed dose-dependent binding to CoVF 
in the absence (data not shown) and presence (Fig. 3) of factor 
D. Enhancement of binding to CoVF was observed in the pres- 
ence of factor D for all mutants. Factor B carrying single 
mutations at positions 187 or 189 had essentially the same 
binding activity as wt factor B, except for the D187Y mutant, 
which only formed about half as much CoVFBb as wt factor B. 
In the D226 panel of mutants, surprisingly only D226N had wt 
binding activity. The same substitution combined with N189D 
resulted in 50% reduction of binding to CoVF compared with 
either the D226N or N189D mutant. The trypsin-like mutation 
D226G alone or in combination with the N189D mutation 
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Fig. 2, Stereoview of the active center of the factor B serine protease domain. The side chains of the catalytic triad residues and of 
selected residues lining the pocket are shown. Hydrogen bonds between the carboxyls of Asp-*"*** and the side chains of Asn^*'^ and Thr^®" are 
shown by dashed lines. 



caused 60 and 87% reduction, respectively, in CoVFBb complex 
formation. Similar reductions in CoVF binding ability of the 
mutants was also observed without factor D cleavage (data not 
shown). The results suggested that, with the exception of the 
D226N mutation, substitutions at position 226 affect initial 
binding of factor B to CoVF thus sensitivity to factor D prote- 
olysis, since binding is a prerequisite for factor B cleavage. In a 
more direct factor B cleavage assay, conversion of biosyntheti- 
cally labeled factor B to Bb by factor D in the presence of CoVF 
was analyzed by SDS-PAGE and autoradiography (Fig, 4). The 
results correlated well with the binding data. Mutant D226N 
was as sensitive to factor D cleavage as wt factor B. Mutants 
D226N/N189D, D226G, and D226G/N189D were less suscepti- 
ble to factor D with conversion to Bb estimated at 53, 27, and 
16%, respectively, of that of wt factor B at the high concentra- 
tion of factor D. The combined results suggest that although 
the overall structural integrity of the mutants was preserved, 
as indicated by equivalent reactivity with the module-specific 
mAbs, amino acid substitutions in the SP domain apparently 
affected CoVF/C3b binding, which is mediated by sites on the 
other two domains of the molecule. 

Hemolytic Activity of Factor B Mutants — The effects of the 
mutations on the ability of factor B to cleave/activate C3 and 
C5 were assessed by a hemolytic assay. The hemolytic activity 
of the mutants relative to that of wt factor B is illustrated in 
Fig. 5. Elimination of the negative charge of Asp^®'' in mutants 
D187A, D187N, and D187S resulted in 50-60% loss of hemo- 
lytic activity. Substitution of Tyr at the same position caused a 
more pronounced decrease in hemolytic activity, approximately 
80%. The data suggest that the bulky hydrophobic side chain of 



T3rr is not favored and that full expression of factor B hemolytic 
activity requires the salt-bridging conformation of Asp^®^. Ala 
mutation at position 188 in the mutant P188A did not have 
significant effect on the hemolytic activity. 

As revealed in the crystal structure, Asn^®^ and the side 
chain of Asp^^® are located at the bottom of the primary spec- 
ificity pocket and appear to be accessible to the Pj-Arg of the 
substrate (Fig. 2). Replacement of Asn^®^ with charged resi- 
dues, either Asp or Lys, reduced hemolytic activity by 95%, 
while the Ala mutant retained approximately 30% of wt activ- 
ity. Although eliminating the negative charge from Asp^^^ in 
the D226N mutant did not affect the assembly of the CoVFBb 
complex (Fig. 3), it completely abrogated the C3/C5 convertase 
activity. Replacement of the same residue with Gly present in 
trypsin also resulted in complete loss of hemolytic activity. 
Again the loss of hemolytic activity was out of proportion to the 
only moderately reduced ability to form the CoVFBb complex 
(Fig. 3). Attempts to construct a trj^jsin-like pocket by re- 
assigning the negative charge to position 189 in the double 
mutants D226N/N189D and D226G/N189D failed to restore 
factor B hemolytic activity, despite the residual CoVF binding 
activity (Figs. 3 and 6). The hemolytic data strongly indicate 
that Asp^'^® plays a critical and highly specialized role in the 
expression of C3/C5 convertase activity by factor B. Residue 
Asn^^^ and Asp^®^ are also of importance for expression of 
factor B-dependent proteolytic activity. In contrast, the Pro 
residue at position 188 has no apparent functional role and 
likely serves as spacer between structurally crucial residues. 

C3 Cleavage Assay — Decrease of the factor B hemolytic ac- 
tivity could reflect a defect of C3 and/or C5 cleavage. The effects 
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Fig. 3. Assembly of solid-phase CoVFBb complex by wt and 
mutant factor B. Microliter plates were coated with CoVF (10 /ig/ml). 
Serial dilutions of wt and mutant factor B in culture supematants of 
transfected COS cells were added and incubated with factor D (1.5 
^ig/ml) at 37 'C for 2 h. CoVF-bound Bb fragments were detected by 
using rabbit anti-human Bb IgG and goat anti-rabbit IgG as detailed 
under "Experimental Procedures.'* Symbols are: A, wt B; D187A; 
T, D187N; ♦ , D187Y; B, ■, wt B; A, N189A; N189D; N189K; C, 

wt B; D226N, O, D226N/N189D; D226G; V, D226G/N189D. 



of the mutations on C3 proteolytic activity were assessed by a 
direct cleavage assay. Wt factor B and selected mutants were 
permanently expressed in CHO cells and purified. Fluid-phase 
C3 convertases were formed with CoVF in the presence of 
factor D. Conversion of C3 to C3a and C3b was assessed by the 
appearance of the a' chain of C3b on SDS-PAGE (Fig. 6). As 
shown, under the experimental conditions used, wt factor B 
converted 45% of a to a* chain, while there was no conversion 
observed in controls not containing CoVF and factor D. The 
N189A mutant demonstrated 37% of wt proteolytic activity. 
This is consistent with the expression of 29% of wt hemolytic 
activity by this mutant (Fig. 5). As expected from the lack of 
hemolytic activity, there was no detectable C3 cleavage by the 
D226N and D226N/N189D mutants even after prolonged expo- 
sure of the film. However, there was trace amount of a chain 
cleavage by the N189D mutant, seen more clearly after long 
exposure of the film. The C3 cleavage study demonstrated that 
at least for the factor B mutants tested loss of hemolytic activ- 
ity could be attributed to loss of proteolytic activity for C3. 

Esteroly tic Activity — Because C3 is a large protein substrate, 
extensive molecular contacts with C3b-bound Bb are probably 
required for its proteolysis. Hydrolysis of small synthetic thio- 
ester substrates containing Arg at the Pj site could provide 
further insights into substrate recognition. In the present 
study we chose Z-Lys-Arg-SBzl as substrate because it was 
shown to be the most reactive among the Arg-containing 03 
or C5-like substrates tested by Kam et al. (14). The catalytic 
efficiency ik^j^JK,„) of recombinant wt factor B was 1135 
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Fig. 4. Cleavage of CoVF-bound factor B by factor D. PSJMet- 
labeled wt and Asp'^'^* factor B mutants secreted by transiently trans- 
fected COS cells were incubated with two different concentrations of 
factor D in the presence of 5 ^tg of CoVF for 2 h at 37 *C or with the high 
concentration of factor D in the absence of CoVF as control. After 
incubation, immunoprecipitation was performed by using a rabbit anti- 
h\mian Bb IgG and S. aureus protein A. Immunoprecipitates were 
washed and subjected to 7.5% SDS-PAGE and autoradiography. Posi- 
tions and molecular mass of marker proteins are given on the left. 



s~^ (Fig, 7) which is similar to the 1370 value reported 

previously for native factor B (14). The recombinant B-SP had 
^cat/^m of 198 s~^, which is 5.7 times lower than that of 
intact factor B. Measurement of individual kinetic parameters 
showed that the decreased k^^JK^^ of B-SP was mainly due to a 
4-fold increase in K^^ Of the mutants tested, D226N showed 
50-fold slower catalytic rate than wt factor B. However, place- 
ment of a negative charge at position 189 on the D226N back- 
ground partially restored esterolytic activity. As shown, the 
k^JK^ of the double mutant D226N/N189D was about 10-fold 
higher than that of D226N. As indicated by the lower than wt 
factor B A^^t unaltered K^^ decreased catalytic efficiency of 
these two mutants could be directly attributed to the decreased 
catalytic rate. These results strongly suggest that the nega- 
tively charged Asp^^® determines binding specificity and cata- 
lytic efficiency for the substrate Z-Lys-Arg-SBzl. Substitutions 
of Asp or Ala for Asn^^^ in N189D and N189A caused 2.7- and 
6.6-fold lower activity, respectively. Although N189A factor B 
had slightly lower esterolytic activity than N189D factor B, it 
had substantially higher proteolytic activity for C3 (Fig. 6). Our 
findings demonstrated that in addition to Asp'^^^, Asn*^^ also 
participates in substrate recognition and in determining spec- 
ificity for C3. Apparently, the structural configuration of resi- 
dues Asp^^^ and Asn^®^ of factor B is critical for recognition and 
cleavage of C3 and C5. 

DISCUSSION 

Determination of the structure of the SP domain of factor B 
revealed a number of novel insertions and deletions compared 
with typical SPs and also certain unique structural features of 
the catalytic apparatus, especially in the primary specificity 
pocket (data not shown). In the present study, mutational 
analysis of factor B residues in and around the primary speci- 
ficity pocket was performed to investigate structural correlates 
of substrate recognition at the site. The results are discussed 
in light of the large amount of available information on SP 
specificity. 

Our results clearly demonstrate that Asp^^® of factor B is a 
critical structural determinant for substrate binding and catal- 
ysis, substituting for Asp^^^ of other SPs with trypsin-like 
specificity. Functional analysis of the D226N mutant provided 
the most clear-cut results. The observed loss of esterolytic and 
proteolytic activity of this mutant could be attributed solely to 
a catalytic defect resulting fi-om inappropriate engagement of 
the Pj-Arg in the Sj site, while other functional sites necessary 
for the proteolytic activation and substrate binding appeared to 
be well preserved. A sharp 50-fold decrease in catalytic rate 
(^cat) indicates that a negative charge at the bottom of the 
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Fig. 5. Hemolytic activity of factor 
B mutants. EC3b (1.5 x 10') were incu- 
bated with serial dilutions of wt and mu- 
tant factor B in culture medium of trans- 
fected COS cells, factor D (12.5 ng), and 
properdin (125 ng) at 30 "C for 30 min. 
Hemolysis was allowed to occur at 37 *C 
for 1 h after addition of 1:40 dilution of 
guinea pig serum in EDTA buffer. For 
each mutant specific hemolytic activity 
(units/^g) was calculated and normalized 
to that of wt B. Each bar represents the 
average ± S.E. of the results of at least 
three separate experiments performed in 
duplicate. 
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Fig. 6. Proteolytic activity of C3 convertases formed by CoVF 
and wt or mutant factor B, Wt or mutant factor B (50 ng) and C3 (75 

ng) were incubated for 1 h at 37 with ( + ) or without (-) CoVF (150 
ng) and D (12.5 ng). Aliquots of the reaction mixture were analyzed on 
7.5% SDS-PAGE under reducing conditions. C3 polypeptide chains 
were detected on Western blots by using a goat anti-human C3 IgG. 
Positions and molecular mass of marker proteins are shown on the left. 
Positions of a, a', and j3 chains of 03 are given on the right. 

primary pocket is essential for efficient catalysis, but not for 
overall substrate binding affinity, because the is not altered 
by the Asn substitution (Fig, 7). Apparently, hydrogen bond 
formation of the P1-P3 residues to the nonspecific substrate- 
binding site, Ser-Tiy-Cily^^'^"^^®, and hydrophobic anchoring of 
the Pg and P3 side chains to 83 and S3 pockets, respectively, 
provide sufficient binding force. Also it seems likely that Asn^^^ 
provides additional binding energy, probably by hydrogen 
bonding with P^-Arg. However, positioning of the scissile bond 
relative to Ser^^'' and the oxy anion hole through the putative 
hydrogen bonds may differ from that effected by the direct ionic 
contact made by Asp^^® in wt factor B. Replacing Asp^"^^ with 
Asn affected equally esterolytic and C3 proteolytic activity, 
although D226N factor B could form a CoVFBb complex. In a 
recent report Hourcade et al. (30) also found that substitution 
of various residues (Asn, Ala, Ser, and Tyr) for Asp^^® caused 
severe reduction in proteolytic activity despite normally assem- 
bled C3bBb complex. It is of special interest that the conserv- 
ative substitution of Glu for Asp^^® also abrogated C3 proteo- 
lytic activity. This observation suggests that accurate 
positioning of the carbonyl group of P^-Arg of C3 relative to the 
nucleophilic Ser^^^ O-7 and oxyanion hole can only be achieved 
by the native residue Asp^^®. A corresponding trypsin mutant, 
D189E, displayed 2-3 orders of magnitude decrease in catalytic 
efficiency {k^.^JKf^), associated with a 40-fold shift in the pref- 
erence from Arg to Lys substrates relative to wt trypsin (31). 
Apparently, the additional methylene group distancing the car- 
boxylate of trypsin D189E firom the peptide backbone within 
the narrow pocket impeded the proper positioning of the side 
chain of Arg, which is longer and larger than that of Lys. The 
loss of C3 catalytic activity by D226E factor B (30) can probably 
be attributed to a similar spatial effect. 

Another structural characteristic of the Sj pocket of factor B 
is a hydrogen bonding network formed by the carboxyl oxygens 
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Fig. 7. Hydrolysis of synthetic thioester substrate by wt and 
mutant factor B and the factor B serine protease domain. Puri- 
fied wt or mutant factor B or recombinant B-SP (113-200 nM) was 
incubated with Z-Lys-Arg-SBzl at concentration of 0.08—0.8 mM. Hy- 
drolysis was measured at 25 'C in the presence of EUman's reagent 
5,5-dithio-bis-(2-nitrobenzoic acid) used as a chromogen of hydrolysis. 
Kinetic parameters were derived from Lineweaver-Burk plots. The 
values of individual parameters are the average ± S.E. of at least three 
independent determinations. 

of Asp^^^ and pocket residues Asn*^^ Thr^®°, and Arg^^^ (Fig. 
2). This effectively reduces ionic bonding potential available for 
making contacts with P^-Arg of the substrate. On one hand, 
this distinct feature could possibly explain the overall low 
esterolytic activity of factor B, Bb (12-14), and B-SP (Fig. 7). 
On the other hand, it implies the need for additional bonding 
between Pj-Arg and other pocket residues. The side chain of 
Asn^^^ faces the carboxyl of Asp^*"^^ from the opposite wall and 
occupies a central position at the bottom of the specificity 
pocket. Although the position of the Asn^*^ side chain is about 
0.5-1.0 A lower than that of Asp'"^^^, it appears accessible to the 
substrate. Our results indicate a supporting role for Asn*®® in 
substrate recognition and catalysis. Substitution of Ala, Asp, or 
Lys at this position caused substantial reduction or abrogation 
of hemolytic activity, which paralleled a similar reduction in 03 
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proteolytic activity (Figs. 5 and 6). The Ala substitution caused 
a decline in synthetic substrate binding affinity (K^) and cat- 
alytic efficiency (k^^^/K^), which strongly indicates participa- 
tion of Asn^^^ in substrate recognition. The amine group of the 
Asn*^^ side chain may mediate Pi-Arg binding through a hy- 
drogen bond. Absence of this potential binding force may com- 
promise accurate register of Pj-Arg of C3 for catalysis. Substi- 
tution of a charged residue. Asp or Lys for Asn^^® in N189D and 
N189K, respectively, abrogates C3 proteolytic activity of the 
C3- or CoVF-bound Bb. Interestingly, the N189D mutant re- 
tains substantial esteroljdic activity toward the synthetic sub- 
strate. These results suggest that the reconstructed Sj pocket, 
with free carboxyls at positions 226 and 189, despite its altered 
geometry could register to the His^^-Ser^^'' dyad, the Arg bond 
of the synthetic substrate but not that of C3. The free leading 
or leaving group of the synthetic substrate may account for the 
observed binding flexibility. 

C2 and factor B have identical proteolytic specificity for 
single Arg peptide bonds of C3 and C5 so that their substrate- 
binding sites can be presumed to be very similar in geometry 
and chemical nature. Thus, it is not surprising that C2 has Asp 
and Ser at positions 226 and 189, respectively (Fig. 1). Besides 
factor B and C2, an acidic residue is also present at position 226 
in a few additional members of the chymotrypsin family, 
namely fiddler crab collagenase (cCOLL) (32), human cathep- 
sin G (CATG) (33), protease 3 (hPR03) (34), and neutrophil 
elastase (hnELA) (35). In contrast to C2 and factor B these 
serine proteases display relatively broad substrate specificity. 
cCOLL and CATG recognize not only basic but also large hy- 
drophobic side chains (32, 36). The Arg^ys substrate prefer- 
ence is mainly attributed to the presence of Asp^^®/Gly^®® in 
cCOLL and of Glu^^^/Ala^®^ in CATG within the Si pocket. The 
large and flexible Sj pocket in cCOLL allows this enzyme to 
adjust to different shapes of the side chain. Removal of the 
negative charge from the cCOLL S^ pocket in the D226G mu- 
tant resulted in a significant decrease of catalytic efficiency 
toward Arg/Lys substrates (37). Similarly to Asp^^^ in factor B 
and cCOLL, the corresponding Glu^^^ in human CATG has 
only one carboxyl oxygen available for substrate binding (33). 
This may be responsible for the relatively slow catalysis of 
substrates with P^-Lys or Arg. However, the presence of a 
negatively charged residue at position 226 is not a sufficient 
condition for specificity for basic residues. Neither hPR03 nor 
hnELA, both of which have an Asp^^®, recognizes a Lys or 
Arg-Pj residue. The two enzymes display close similarity of 
their S^ sites and cleave after small mostly hydrophobic resi- 
dues, such as Leu/Ile (hnELA), Ala/Ser (hPR03), and Val/Met 
(hnELA and hPR03) (38). The presence of He and Val at posi- 
tion 190 of hPR03 and hnELA, respectively, seems partially 
responsible for their substrate specificities. In hnELA, loss of 
specificity for basic residues has been attributed to inaccessi- 
bility of Asp226 that is shielded by Val*^*' and Val^'^. Similarly, 
Asp22e of hPR03 is also shielded by Ile^^^ and Val Taken 
together, the data indicate that Arg/Lys substrate specificity is 
structurally determined not only by the presence but also by 
the accessibility of an acidic side chain at the base of the 
specificity pocket, positioned either at 189 or 226. The carboxyl 
oxygens of Asp^^® or Glu^^® seem less available to substrate 
than those of Asp^^® because of participation in hydrogen- 
bonding networks with residues on the wall of the pocket. This 
appears to be a distinct feature observed in factor B, the neu- 
trophil elastases, and cCOLL. 

Structural and functional consequences of altering the 
Asp^®® of trypsin have been examined by site-directed mu- 
tagenesis, kinetic, and crystallographic analysis (39). The neg- 
ative charge was relocated to the opposite wall of the binding 
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pocket in rat trypsin mutant D189G/G226D. Kinetic analysis 
showed that, compared with wt trypsin, this relocation of the 
negative charge caused 10^- and 4.5 X 10^-fold decrease in 
catalytic efficiency ik^j^JK^) toward P^-Arg and -Lys containing 
substrates, respectively. The decrease resulted from a much 
sharper decUne in A gat for the Arg than the Lys substrates, 
whereas the binding affinity {K^ for both substrates was 
equally reduced. The crystal structure of D189G/G226D trjrp- 
sin in complex with inhibitors showed that in its new position. 
Asp interacts extensively with other residues in the pocket 
through hydrogen bonds, which greatly reduce its negative 
charge potential. Similarly to trypsin D189G/G226D, the na- 
tive Asp^^® of factor B forms hydrogen bonds and this correlates 
with the low binding affinity and overall low catalytic efficiency 
toward P^-Arg/Lys peptide substrates (12-14). Re-constructing 
the pocket of factor B in the D226N/N189D mutant caused 
complete loss of hemolytic and C3 proteolytic activity (Figs. 5 
and 6), although esterolytic activity toward the P^-Arg thio- 
ester substrate was partially retained (Fig. 7). The kinetic 
analysis showed that the 80% reduction in esterolytic activity 
{k^^JK„^ was almost entirely due to reduction in k^^^, whereas 
the was not affected. Thus, the exact location of the nega- 
tive charge at base of the Si site and particularly its spatial 
relationship to the His^^-Ser'^^ dyad and the oxyanion hole, 
which is altered in trypsin D189G/G226D and factor B D226N/ 
N189D, are especially critical for efficient catalysis. 

In an effort to directly compare factor B to trypsin, a Gly 
residue was substituted at position 226 either alone (D226G) or 
in combination with the N189D mutation (D226G/N189D). Nei- 
ther mutant had hemolytic activity. However, loss of hemol3^ic 
activity could not be attributed exclusively to defective sub- 
strate recognition at the S^ site because the ability of these 
mutants to participate in the assembly of the C3 convertase 
was also affected (Figs. 3 and 5). Binding of the mutants to 
CoVF and their sensitivity to factor D cleavage was substan- 
tially decreased indicating conformational changes near or at 
the C3b/CoVF-binding sites, which are presumed to be distal to 
the mutation sites. Because overall folding of the polypeptide 
chain and the conformation of antigenic epitopes appeared 
unaffected, the conformational alteration of the C3b-binding 
site must be subtle, albeit functionally significant. At present it 
is not clear how the catalytic center relates spatially to the 
C3b/CoVF-binding sites. Hourcade et al. (30) also described a 
conformational change at a site distal from the mutation in the 
F227A mutant (30). The mutant was cleavable by factor D, but 
cleavage did not promote the conformational change to a high 
affinity C3b-binding proteolytically active state, which charac- 
terizes wt factor B. The Bb fragment of this mutant was rec- 
ognized by a Bb-specific mAb at much lower efficiency than the 
wt counterpart. As viewed in the structure of B-SP, the 
RDFHIN^^^''^^^ segment forms an extended internal )3-strand, 
which is buried within the protein core. Substituting Ala for 
Phe at po.sition 227 might destabilize the core, affecting the 
conformation of the surface epitope recognized by the Bb-spe- 
cific mAb (30). This epitope is probably located near the 
RDFHIN^'^^*'^^" segment and is only reactive in Bb perhaps 
because it is sterically hindered by the Ba region of intact factor 
B or because it undergoes a conformational change upon cleav- 
age/removal of Ba. Our D226G mutants might have conforma- 
tional change(s) within the same region. However, the relation- 
ship between the possible conformational change of the 
antigenic epitope and that of the C3b-binding site is still 
unclear. 

It is of interest that the RDFHIN^^^^^-^^^ motif is found in 
factor B and C2 of most animal species, but is absent from all 
other complement enzymes (1) as well as from other SPs of the 
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large chymotrypsin family (40, 38). This underlines the funda- 
mental role of Asp^^® in the function of factor B and C2 in 
complement activation. Therefore, the native conformation of 
Asp^^^ and Asn^^^ or Ser^*^ within the Si pocket of factor B and 
C2, respectively, constitutes one of the structural determi- 
nants, which have evolved to optimize the highly specific C3/C5 
cleavage. However. C3/C5 recognition and hydrolysis require 
more extensive enzjone-substrate contacts than interaction of 
the side chain of Pj-Arg with residues of the Sj site. The 
disparity in catalytic activity toward C3 and dipeptide sub- 
strates of N189D and D226N/N189D factor B (Figs. 6 and 7) 
probably reflects the complexity of the interaction between 
C3b-bound Bb and its natural substrates, C3 and C5. 

In the present study, we correlated the crystal structure of 
B-SP to the detailed mutational analysis of the factor B Sj 
pocket. The resulting information contributes to current under- 
standing of the structural basis for factor B and C2 substrate 
specificity and catalysis. Such knowledge is crucial for design- 
ing highly specific inhibitors that could have therapeutic po- 
tential for complement-mediated human diseases. 
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A cDNA clone encoding enteropcpiidase (EC 3,4.21,9). a key enzyme for the conversion of (rypsinogen to 
trypsin, was isolated from a rat duodenal mucosa cDNA library. Sequence of the 3585 base pair clone predicted 
that enteropeptidase is synthesized as a single-chain precursor form, proenieropcpiidase, consisting of 1058 
amino acid residues with an internal signal sequence (51 residues) and is then processed into the mature enzyme 
consisting of three different peptide chains, i.e., mini, light and heavy chains, not the previously reported 
two^hain enzyme. The structure of enteropeptidase is relatively conserved among different species and the rat 
enteropeptidase is 24 and 39 amino acids longer than the porcine and human ones, respectively. Northern blot 
analysis of RNAs from normal nit tissues revealed that the enteropeptidase mRNA of around 4.4 kb in size was 
' - expressed only in the duodenal mucosa, and high proteolytic activity of the enzyme was detected in the 'proximal 
small intestine. Additional analysis of the RNAs by RT-PCR revealed that a low level of the mRNA was also 
expressed in the other parts of the small intestine, i.e.. jejunum and ileum. These results indicate that the 
biosynthesis of enteropeptidase takes place mainly in the proximal small intestine, the duodenum, and the 
importance of the region in the physiology of intestinal protein digestion regulated by the enzyme is suggested. 
Furthermore a faint signal of the mRNA was also detected in the stomach, colon and brain in which the existence 
of trypsin-like serine proteases were reponed. The significance of the low level expression of the gene is unclear, 
but the potential peptide-processing function of the enzyme in these tissues is also suggested. o 1996 Aca»icmic 

Press. Inc. 
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Enteropeptidase (Enterokinase EC 3.4.21,9) was initially recognized as an intestinal factor which 
activates the latent enzymes in pancreatic fluid. Later the enzyme was proved to be involved in the 
conversion of trypsinogen lo trypsin (1). leading to the activation of various pancreatic zymogens 
involved in the later stages of the digestive cascade. Therefore, enteropeptidase has been consid- 
ered to be a key enzyme in the intestinal protein digestion. Because of its medical and physiological 
importance, the enzyme has been purified from the small intestine of various species, including 
bovine (2), porcine (3) and human (4). In addition, their cDNA structure have recently been 
determined in these species by us and others (5-8). However, the details of the structure and 
function of the enzyme are still unclear now. For example, the number of the peptide chains 
composing the mature enzyme is differently reported depending on the species and the mechanism 
of the enzyme activation remains to be elucidated. Also unclear is the regulatory mechanism of its 
synthesis in the gastrointestinal tract. In order to clarify these problems and because the laboratory 
rat is a highly developed experimental model to study the physiology of the intestinal digestion, \vc 
attempted to characterize rat enteropeptidase. In this study, we determined the nucleotide sequence 
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cDN A encoding rat ehterdpeptid^^ primary smicture of the enzymeVand ahalyz^ th^t 

gene expression in the rat digestive tract and various other organs. 

. MATERIALS AND METHODS 

• Tissue preparation, RNA isolation and assay of en^matic activities of entervpeptidase. All tissues were collected from 
Wtstar strain male adult rats (8 weeks old, Charles River Japan. Inc.). The excised tissues were washed with ice-cold 
phosphate buffered saline and were stored frozen in liquid nitrogen until use. From the tissues, total RNA was prepared by 
:V* the guanidium tsothiocyanate/cesiiim chloride density gradient ultracentrifugalion and. poly (A)*RN A was .selected by 
^^V' oligo(dT)-celluIose column chromatography. The proteolytic activity of cnteropeptidase in the tissue samples was measured 
f^rtr fluorometrically by a modified -method of Antonowicz et at.' (9), using a synthetic substrate [Gly*(Asp)4-LyS'^- 
n:::?htylamide). Unless otherwise specified, 2mM EDTA was included in (he reaction mixture in this study. 

isolation and characterization of the cDNA clone for rat cnteropeptidase. Rat duodenal mucosa polyCA)"" RNA was used 
for the preparation of a cDNA libmry. Double-stranded cDNA was synthesized according to the procedure of Gubler and 
Hoffman (10). After methylatJon of the internal fcoRI sites and addition of EcoRI linkers, the cDNAs were fractionated 
according to their size by agorose-gel electrophoresis. The cDNA larger than 1 .Skb in length was ligated into the fcoRI sites 
of lambda ZAP II vector (Stratagene. USA). The phages were packaged and recombinants were selected by plating on £ 
coli strain XL-I blue. Nyton filters that carried denatured recombinant DNAs were screened by [^^P]-labeled porcine 
enteropepiidase cDNA (7). The positively hybridized clones were identified and isolated by repeated purification. The 
purified phages were converted to the corresponding plasmid fonm utilizing the plasmid excision procedure provided by the 
manufacturer and were used as a. template for DNA sequencing. Sequencing was performed by dideoxy chain termination 
method on both strands of denatured plasmid cDNA inserts using a Taq dye terminator sequencing kit (Applied Biosystems. 
I .-.), a thcnmal cycler (model 480, Perkin Elmer Cetus), and a DNA sequencer (model 37 1 A. Applied Biosystems, Inc.). 

iV.RNA detection by Northern blotting and RT—PCR. 10 |xg of total RNA from various rat tissues were denatured and 
subjected to electrophoresis on a 0.66 M formaldehyde -agarose gel. After the RNA had been transferred to a nylon 
membrane filter, the fitter was hybridized with the -labeled full-length cDNA for rat cnteropeptidase under high- 
stringency conditions. The size of RNA was estimated by reference to the mobility of 18s and 28s rRNAs and fragments 
of ADNA generated by digestion with Hind 111. Primers specific for the amplification of the rai enteropepiidase heavy chain 
(5' primer. 5*-ATTTGATGATGCrmTTG-3'; 3' primer 5 ' - AGCriTGG I" I C 1 G G ATA AG -3 ' ; size of the amplified frag- 
,mcni, 491bp) and G3PDH (S' .primer, 5 '-TG AAGGTCGGTGTCA ACGG ATTTGGC-3 ';. 3'_ primer 5'-CA.TG-. 
TAGGCCATGAGGTCCACCAC-3') were synthesized with a DNA synthesizer (model 380A. Applied Biosystems. Inc.) 
and purified by gel filtration. For each reaction, I pig of poly(A)* RNA from representative tissues was reverse-transcribed 
to cDNA and the resulting cDNA was subjected to 20 to 40 cycles of PGR using Takara Toq DNA polymerase (Tokara, 
J;-nn) under the following conditions; 94'*C for 60sec. — » 48^*0 for 30sec. — » 74**C for 60sec. In the above-mentioned 
Cv'iiditions. the amplified signal derived from the genomic DNA encoding cnteropeptidase was around l.6kb in size. The 
PGR products were electrophoresed through a 1.0% agarose gel in IX TAE buffer and visualized by ethidium bromide 
staining. 

RESULTS AND DISCUSSION 

Approximately 5 x 10^ clones wepe. screened by hybridization with a full-length f>orcine en- 
teropepiidase cDNA. Over 500 clones were identified as positive for the probe. Among these 
clones. 50 clones hybridized positively with 0.6 kb EcoKV fragment representing the NHj-terminal 
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PIC. 1. Restriction map and sequencing strategy of a rat cnteropeptidase cDNA clone (REK/f7). Deletion mutants 
i mstructed from subcloned fragments were used for nucleotide sequencing, and sequencing was done in both directions as 
described in Materials and Methods. Arrows indicate the direction and extent of sequencing of fragments subcloned in 
pBluescript. Lines indicate the 5'- and 3'-noncoding region, a closed tx>x indicates the putative internal signal sequence of 
proenteropepiidase. Open boxes indicate the coding region including the M. H and L-chains of mature cnteropeptidase. 
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domain of porcine enierbpepiidasc' These "clones were- isolated by" repeated purification. -The 
restriction site map constructed for these clones revealed that their structures are basically the same 
and the nucleotide sequencing on bilateral ends disclosed a common nucleotide sequence. One 
clone (REK#7) was found to contain the entire coding region for rat enteropeptidase. The restric- 
tion map and the sequence analysis strategy of the clone is shown in Fig.l. The resulting nucleotide 
sequence and the deduced amino acid sequence of rat enteropeptidase are presented in Fig.2. The 
analyzed cDNA clone was 3585 base pairs (bp) long, including the 5'-noncoding region (-166bpX 
the coding nucleotide sequence (3,174bp) and the 3'-noncoding region (245bp). A typical poly- 
adenylation signal was present at the 3554th base pair position. The second methionine codon at 
nucleotides 167-169 in the open reading frame meets the criteria for the initiation site of the 
translation (11). Thus, the cDNA encoding rat enteropeptidase predicts a molecule of 1058 amino 
acids residues (Mr= 117,700). Recently, we purified the enzyme from porcine duodenal mucosa 
and structurally characterized it. In addition, we have cloned and analyzed the cDNA coding for the 
protein (7). The primary structures of the rat and porcine enzymes are relatively conserved; 77% 
identical in the nucleotide sequence and 71% in the encoded amino acid sequence. The comparison 
of the rat cDNA sequence with the porcine one indicated that the enzyme is originally synthesized 
as a single-chain precursor and processed into a three-chain enzyme rather than the heterodimeric 
enzyme previously reported in other species (2,3). The NHj-terminal sequences of the mini (M), 
heavy (H), and light (L)-chains are deduced to start at positions 53, 1 19, and 819, respectively, thus 
leading to the production of three chains consisting of 66 (Mr = 7,700), 700 (Mr = 77,700). and 240 
(Mr = 26,800) amino acid residues. There is a hydrophobic domain comprising 25 amino acid 
residues preceding the NH2-ierminus in the rat proenteropeptidase sequence; double underlined 
region from positions 19 to 43. Although there is one amino acid insertion (Ala at position 52) in 
the prepeptide sequence compared with other species (6,7). the hydrophobic segment is observed 
in common, probably serving as an internal signal sequence. While we were preparing the manu- 
script, the sequence of the cDNA encoding human enteropeptidase was reported, presenting the 
possibility of a two-chain structure of the human enzyme (8). However, it is noteworthy that in 
addition to the H and L-chains, a sequence similar to the rat and porcine M-chains is also observed 
in the human sequence. The homology of the region is particularly high (88% vs. porcine and 83% 
vs. human enzymes) compared with that in other regions (64—68% in the H-chain, 77—78% in the 
L-chain). Thus, it is highly probable that human enteropeptidase is also a three-chain enzyme. 
Among these three chains, the homology of the H-chain is the lowest due to insertions/delelions of 
variable length around the Ser/Thr-rich regions, potential O-Iinked glycosylation sites. The rat 
enzyme has 7 insertions (18 amino acids in total) and 50% of the inserted amino acids are Ser and 
Thr residues, which are probably involved in O-linked carbohydrate attachment. The rat enzyme is 
therefore considered to be the most O-linked carbohydrate-rich enteropeptidase among the previ- 
ously reported si>ecies. Furthermore, some of these inserted amino acids give rise to two additional 
potential N-glycosylation sites, leading to heavy glycosylation of the regibn. The number of 
potential N-linked glycosylation sites is variable depending on species (rat 20, human 18, bovine 
19 and porcine 22), but their positions are almost conserved. These carbohydrate moieties are 
presumably important to protect the enzyme from the access of other digestive proteases in the 
intestinal content. The variety of the glycosylation sites observed among species may somehow be 
related with the divergence in the environment in the intestinal lumen and physiology of digestion. 
The common basic structure of the catalytic domain of serine proteases is also observed in the rat 

FIG. 2. Nucleotide and deduced amino acid sequences of the rat enteropeptidase cDNA clone. Double underlined 
sequence indicates a putative internal signal sequence. Boxed domains with (M). (H) and (L) arc the deduced regions, 
corresponding to the M. H and L-chains of the mature enzyme, respectively. The underlined sequence at 665-802bp is the 
variable and Ser/Thr rich region, including 18 amino acid residues of insertions observed in the rat enzyme. PotentiaJ 
N-linkcd glycosylation sites ore indicated by closed boxes. 
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-L^hain. Consistent with the previous data indicating that the enzyme activity is attained by~ihe?^ 
. L-chain alone, the homology of the region is high among different species (77-78%). There is^^^ 
however, an insertion of 4 amino acid residues in the sequence next to the catalytic triad of serine^^^ W:. 



proteinases, whereas the three basic amino acid residues, important to keep the substrate specificity*^ 



it"- 



for the trypsinogen. are well conserved. 

TTie expression of the enteropeptidase gene in various rat tissues was examined by Northern bloi"j^ 
analysis using the cloned full-length cDNA as a prove. As shown in Fig'.3, a signal of 4.4 kb .t^l 



enteropeptidase mRNA was observed only in the duodenum, but riot in the other parts of gastro- 
intestinal tract from the esophagus to the colon and also not in other organs such as the brain, heart, 



lung, liver, kidney and spleen. Since the comparable signal for G3PDH mRNA was' observed in all 
RNA samples analyzed, it is evident that the paucity of the enteropeptidase itiRNA in the jejunum \^ 
and ileum was not caused by the degradation of the RNAs. There is a controversy as for the -vV^ 
distribution of enteropeptidase; some of the previous reports indicated the limited localization of "''t 
the enzyme in the duodenum (12). while others the distribution throughout the small intestine (9).~~^ 
Thus, to further measure low levels of enteropeptidase gene expression semiquantitatively, we '[': 
employed the RT-PCR method and selected a primer set and amplification conditions with high tz 
sensitivity and low background. Three PCR cycles were used for quantitative estimation. The .T 
RT-PCR result of the RNA samples used in the Northern blotting is shown in Fig.5. The PCR __2 
product had a molecular size of 0.5 kb corresponding to the expected product of 491 bp and was 
shown to hybridize with the rat enteropeptidase cDNA by Southern blotting (data not shown). A - 
strong signal was observed in the duodenum and also weak signals in the jejunum and ileum. The 
signal detected in the ileum at 34 cycles was weaker than that of the duodenum at 30 cycles. Thus, 
the mRNA level in the duodenum is considered to be at least 10 times higher than that in the distal 
pan of small intestine, the ileum end. These results indicate the gene expression of the enzyme 

along the entire small intestine, though-the level of the expression is low in the distal segment. ; 

Previous studies revealed relatively high enzyme activity throughout rat small intestine (9). The 
analysis of our samples by the same assay for the enzyme activity also gave essentially the same 
result (Fig.4/A). However, it was indicated that their method also measured the coexisting ami- 
nopeptidase activity together (13), By including 2mM EDTA in the reaction buffer, the activity of J. 

aminopeptidase could be completely diminished, while that of enteropeptidase was not much 

affected, at least 80% of the activity having remained (unpublished data). Thus, an approximate 
estimate for the enteropeptidase level could be obtained by the method used in the presence of 
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under the condition of high stringency using the rat enteropeptidase cDNA as a probe. The lines on the left indicate the 
po'siijons of Uie 28s and 18s rit>osomal RNAs. The results of rehybridization of the filter with glycerolaldehyde-3-phosphatc 
dehydrogenase (G3PDH) cDNA are shown at the bottom. 
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FIG. 4. Entcropeptidase activity along the rat small intestine. Small intestine from the duodenum to the ileum end was 
divided into 8 equal segments, and the activity in each segment was measured as described in the Materials and Methods 
section (A: without EDTA, B: with 2mM EDTA in the reaction, respectively). Value of each segment indicates the 
percentage of the enzyme activity when. that in the duodenum (segment No. 1) is regarded as 100%. 



2mM EDTA and the result of the measurements in the small intestine is shown in Fig.4/B. This 
indicates the presence of high enzyme activity in the proximal segment of the small intestine, while 
o enzyme activity was detected in the distal segment despite the high sensitivity of the method. 
Taken together, the above-mentioned results clearly indicate that the biosynthesis of enteropepti- 
dase is regulated region-specifically both at the level of transcription and translation and that main 
place of the synthesis is the proximal . segment of the small intestine, the duodenum, where 
pancreatic secretion join the intestinal contents. The distribution of the mRNA and the enteropep- 
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FIG. 5. Entcropeptidase mRNA expression detected by RT-PCR in the rat esophagus (Es>. stomach (Si), duodenum 
Oju), jejunum (Jc). ileum (II). colon (Co) and brain (Br). The amount of each cDNA sample included in the reaction was 
adjusted to the same quantity according to the G3PDH mRNA expression. Three successive cycles were employed to 
confirm the exponential omplincation. Primers used for the amplification were as follows: (A) G3PDH. (B) and (C) rat 
enteropeptidase H*chain speciftc primer. 
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ddase activity strongly indicate the imponance of the proximal small intestine in the physiology of 
the intestinal protein digestion regulated by the enzyme. 

In addition, faint signals of enteropeptidase mRNA were also observed in the stomach, colon and 
brain at 40 cycles (Fig.5/C). The enzyme activity is undetectable in these organs and the physi- 
ological importance of these findings remain to be elucidated. However, these fmdings are ir;-^r- 
esting in context with the previous reports indicating the presence of trypsin-like serine proteai^es 
in these tissues (14, 15). Trypsin-like serine proteases are playing important roles in many bio- 
logical processes. Especially in human brain, they are considered to be involved in the pathogenesis 
of Alzheimer*s disease, playing a role in j3-amyIoid production (15). Thus, the observed distribu- 
tion of the mRNA may indicate a role of enteropeptidase in the processing of bioactive peptides by 
regulating the activity of trypsin-like proteases. 
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Previously we isolated a trypsin-like enzyme desig- 
nated human airway trypsin-like protease from the spu- 
tum of patients with chronic airway diseases. This paper 
describes the cDNA cloning, characterization of the pri- 
mary protein structure deduced from the cDNA, and 
gene expression of this enzyme in various human tis- 
sues. We obtained an entire 1517-base pair sequence of 
cDNA with an open reading frame encoding a polypep- 
tide with 418-amino acid residues. The polypeptide con- 
sisted of a 232-residue catalytic region and a 186-residue 
noncatalytic region with a hydrophobic putative trans- 
membrane domain near the NHj terminus. The polypep- 
tide was suggested to be a type 11 integral membrane 
protein in which the COOH-terminal catalytic region is 
extracellular. Therefore, this protein is thought to be 
synthesized as a membrane-bound precursor and to ma- 
ture to a soluble and active protease by limited proteol- 
ysis. It showed 29-38% identity in the sequence of the 
catalytic region with human hepsin, enteropeptidase, 
acrosin, and mast cell tryptase. The noncatalytic region 
had little similarity to other known proteins. In North- 
ern blot analysis a transcript of 1.9 kilobases was detect- 
able most prominently in the trachea among 17 human 
tissues examined. 



Many previous investigations have indicated that proteases 
released from immunoinflammatory cells participate in patho- 
genesis of several kinds of respiratory diseases. For instance, 
neutrophil elastase has been shown to be intimately related to 
the pathologic states of pulmonary emphysema (1, 2), cystic 
fibrosis (3, 4), interstitial pneumonia (6), and adult respiratory 
distress syndrome (6) through destruction of extracellular ma- 
trix components, such as elastin, of alveolar and bronchial 
tissues. Mast cells, which abound in airway mucosa and in 
alveolar wall, release trypsin-like protease (tryptase) and chy- 
motiypsin-like protease (chymase) into extracellular spaces 
during degranulation (7). The tryptase has potential to stimu- 
late smooth muscle, fibroblast, and tissue turnover (8). Differ- 
ent substrates for chymase (9—11) indicate the potential in- 
volvement of the enzyme in a variety of processes related to the 
inflammatory response. Recently it was revealed that chymase 
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from human mast cells selectively converted big endothelins to 
trachea-constricting peptides (12). These effects of the two 
mast cell proteases have attracted considerable attention as 
one of the pathogenic determinants and the therapeutic targets 
of bronchial asthma and allergic inflammation. Elastase re- 
leased from alveolar macrophages has also been suggested to 
contribute to the pathogenesis of pulmonary emphysema by 
degrading matrix components of alveolar walls (13, 14). 

However, there are very few reports dealing with the func- 
tions and roles of proteases secreted from respiratory tissues, 
such as secretory glands or surface epithelial cells of the air- 
way. Kido and co-workers (15, 16) found a novel trypsin-like 
protease that is secreted from rat Clara cells, secretory cells 
localized to the distal airway only. The protease, named 
tryptase Clara, was shown to enhance the infectivity of influ- 
enza and Sendai viruses (17), although its physiological role is 
unknown. 

Previously, we found trypsin-like activity in the sputum of 
patients with chronic airway diseases and isolated a novel 
trypsin-like protease from the sputum, designated human air- 
way trypsin-like protease (HAT)^ (18). Gel filtration studies 
showed that HAT was a monomeric enzyme with an apparent 
molecular mass of 27 kDa. Immunohistochemical studies 
showed that HAT was localized mainly in cells of submucosal 
serous glands of the bronchi and trachea. These results indi- 
cate that HAT is released from the submucosal serous glands 
onto mucous membrane, at least in patients with chronic air- 
way diseases. 

In this paper, we report the cloning of HAT cDNA, the 
primary structure of this enzyme and characterization of the 
polypeptide deduced from the nucleotide sequence of the cDNA, 
and results of analysis of expression of HAT mRNA in various 
human tissues. The primary structure of HAT was compared 
with that of other known serine proteases. 

EXPERIMENTAL PROCEDURES 

Materials — Human trachea QUICK-Clone™ cDNA, human trachea 
poly(A)* RNA, human trachea AgtlO cDNA library (oligo(dT) and ran- 
dom-primed), 5' -RACE kit, human multiple tissue Northern blots, and 
human /3-actin cDNA were purchased from CLONTECH Laboratories 
Inc. (Palo Alto, CA). Tag DNA polymerase was from Promega Corp. 
(Madison, WI). SureClone^^ ligation kit, dNTP, and plasmid vector 
pUClS were from Amersham Pharmacia Biotech. Avian myeloblastosis 
virus reverse transcriptase and RNase inhibitor were from Boehringer 
Mannheim. Restriction endonucleases, random primer labeling kit, and 
Escherichia coli JM109 were from Takara Shuzo Co, Ltd. (Otsu, Japem). 
Nylon membrane Hybond^"-N-t- for blotting and [a-''^PldCTP for probe 
labeling in hybridization were from Amersham. Dcnhardt's solution 
and salmon sperm DNA were from Wako Pure Chemical Industries Ltd. 
(Osaka, Japan). Qiagen lambda kit for purification of phage DNA was 



* The abbreviations used are: HAT, human airway trypsin-like pro- 
tease; PCR, polymerase chain reaction; RACE, rapid amplification of 
cDNA ends; bp, base pairCs); kb, kilobasc or kilobase pair. 
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from Qiagen GmbH. (Hilden, Germany). Oligonucleotide purification 
cartridge column and DyeDeoxy*™ terminator cycle sequencing kit for 
sequencing of DNA were from Applied Biosystems Inc. (Foster City, 
CA). 

DNA Amplification by Polymerase Chain Reaction (PCR) — PGR was 
performed according to the procedure described by Sambrooke/ al. (19). 
Oligonucleotides used as PGR primers were synthesized by a DNA/RNA 
synthesizer (Applied Biosystems Inc., model 394) and purified by oligo- 
nucleotide purification cartridge column. Unless otherwise stated, PGR 
was carried out by adding 15 pmol of each primer and an appropriate 
amount of template DNA to 20 n\ of PCR buffer (10 mM Tris-HGl, pH 
9.0, 50 mM KGl, 1.5 mM MgGla. 1% Triton X-100) containing 0.5 units of 
Tag DNA polymerase and 0.2 mM dNTP. The reaction using a DNA 
thermal cycler (Perkin-Elmer Corp.) was carried out for 35 cycles of 
1-min denaturation at 94 *G, 1.5-min annealing at 57 ''G, and 2-min 
extension at 72 'G. 

Subcloning of DNA Fragments — To subclone DNA fragments that 
were amplified by PCR, SureClone'*'** ligation kit was used. DNA frag- 
ments were blunted by Klenow fragment, inserted into the Smal site of 
plasmid vector pUClS, and introduced into B. coli JM109 by Hanahan's 
method (20). On the other hand, for subcloning of insert DNA of AgtlO 
phage clone, the insert DNA was excised by EcoBJ from phage DNA, 
which was purified using Qiagen lambda kit and inserted into the 
£coRI site of plasmid vector pUClS. E. coli JM.109 was transformed as 
described above. Plasmid DNA was isolated from each transformant by 
the alkaline lysis procedure (21) with minor modifications. 

Analysis of DNA and Amino Acid Sequence — The nucleotide se- 
quence of the DNA inserted into plasmid vector pUClS was analyzed by 
an automated DNA sequencer (Applied Biosystems Inc., model 373) 
using the Dye-Deoxy'*^" terminator cycle sequencing kit. Both strands of 
all clones were completely sequenced. Hydropathy of amino acid se- 
quence was analyzed (22) with the Genetyx program package (Software 
Development Co. Ltd., Tokyo, Japan). A computer survey of the Na- 
tional Biomedical Research Foundation (Washington, D.G.) and 
SWISS-PLOT (European Bioinformatics Institute, Geneva, Switzer- 
land) data banks for similarity of amino acid sequences between HAT 
and other known proteins was carried out using MPsrch program, 
which was modified from the method of Smith and Waterman (23) with 
Teijin Systems Technology Ltd. (Yokohama, Japan), 

Amplification of a Partial cDNA Fragment—In a previous report 
(18), we showed that the sequence of the 20 NHs-terminal amino acids 
of native HAT purified from the sputum of patients with chronic airway 
diseases was ILGGTEAEEGSWPWQVSLRL (amino acids 187-206 in 
Fig. 1). Based on this amino acid sequence, we designed and synthe- 
sized two kinds of degenerate PCR primers; namely 5'-ATCYTNGGRG- 
GNACNGAGGC-3'"^ (sense) and 5'-ARKCKMAGGCTSACYTG-3'^ (an- 
tisense) to obtain the 59-bp cDNA fragment encoding the front 19 
residues of the NH.^-terminal amino acid sequence by PGR. PGR was 
carried out in the reaction mixture containing 5 pmol of each primer 
and 1 ng of cDNA derived from human trachea (QUICK-Clone'"^ 
cDNA), The amplified DNA fragment was then subcloned and se- 
quenced as described above. The analysis of the sequence showed that 
a 59-bp DNA fragment encoding the 19-residue amino acid sequence 
corresponding to the NHg terminus of the purified HAT was produced 
by this PCR. 

Amplification of cDNA by 3' -Rapid Amplification of cDNA Ends 
(RACE) — To obtain a cDNA that had a nucleotide sequence in the 
downstream side of the 59-bp DNA fragment, we employed the 3'-RACE 
method developed by Frohman et al. (24). Two kinds of sense primers 
were used to amplify the cDNA specifically and effectively. These prim- 
ers were designed and synthesized based on the nucleotide sequence of 
the 59-bp cDNA fragment. At first, single-stranded cDNAs were syn- 
thesized by reverse transcription at 42 *C for 60 min in 20 yA of reaction 
buffer (50 mM Tris-HGl, pH 7.6, 60 mM KGl, 10 mM MgClj, 1 niM 
dithiothreitol) containing 10 ng of human trachea poly(A)^ RNA, 115 
pmol of (dT),7-adapter primer 5'-GACTCGAGTCGACATCGA(dT)„-3', 
25 units of RNase inhibitor, 1 mM dNTP, and 40 units of avian myelo- 
blastosis virus reverse transcriptase. One-tenth of the reaction mixture 
was used as a template in the first-round PCR in which 5'-ATCTT- 
GGGGGGGAGGGAGGGTGA-3' and the adapter primer 5'-GAGTG- 
GAGTCGAGATCGAT-3' were used as the sense and antisense primers, 
respectively. For further amplification of the cDNA, the second-round 
PGR was carried out using one-fortieth of the first-round PGR reaction 
mixture as the template with 5'-GAGGCTGAGGAGGGAAGCTGGG- 



^ y represents T or C; N represents C or I (inosine); R represents G or 
A; K represents G or T; M represents A or G; S represents G or G. 



CGT-3' (nucleotides 635-659 in Fig. 1) and the (dT)i7-adapter primer 
described above as the sense and antisense primers, respectively. The 
cDNA amplified by 3'-RAGE was then subcloned and sequenced. 

Screening of cDNA Library — Plaque hybridization against human 
trachea cDNA library was performed according to the standard proce- 
dure (19). The DNA fragment obtained by 3 '-RACE was labeled by the 
random prime method (25) using [a-'^^PldCITP and random primer la- 
beling kit. Using this probe, 1 x 10° plaques derived from human 
trachea AgtlO cDNA library were screened by hybridization as follows. 
The blots for the plaques were hybridized with the probe at 65 *C 
overnight (16-20 h) in a solution containing 5x SSPE bufTer (0.75 M 
NaCl, 50 mM NaHaPO^. 5 mM EDTA, pH 7.4), 5X Denhardfs solution, 
0.1% SDS, and 100 /ig/ml denatured salmon sperm DNA. These blots 
were then washed twice at 65 "C for 20 min with O.lx SSPE bufTer 
containing 0.1% SDS. F^ve positive clones were selected and plaque- 
purified, and the insert DNAs of these clones were then subcloned and 
sequenced. 

Amplification of cDNA by 5' -RACE — To obtain a cDNA that had a 
nucleotide sequence in the upstream side of the cDNA coding for native 
HAT, amplification of the cDNA was carried out using 5'-RAGE kit (24). 
Single-stranded cDNAs were synthesized by reverse transcription of 2 
fig of human trachea poly(A) * RNA using the antisense primer 5'- 
AGGTGGCAATGCAGTGACGAGGATT-3' (nucleotides 785-761 in Fig. 
1). The single-stranded cDNAs were purified using glass powder in 
5'-RAGE kit after alkaline hydrolysis of RNA in the reaction mixture. 
Using T4 RNA liga.se, AmpliFINDER'^" anchor was ligated to the 
3'-ends of the single-stranded cDNAs. PGR amplification (0.75 min at 
94 °G, 0.75 min at 57 *G, and 2 min at 72 "O was then carried out using 
0.01 of the ligation mixture as template, with anchor primer 5'-CTG- 
GTTGGGGGCAGCTCTGAA<^TTCCAGAATCGATAG-3' and 5'-TGA- 
GCTGGTGTGAGGATGCACATGT-3' (nucleotides 741-717 in Fig. 1) as 
the sense and antisense primers, respectively. The cDNA amplified by 
5 '-RAGE was then subcloned and sequenced. 

Expression and Purification of Recombinant HAT — A 1.3-kb BamHI- 
Hindlll fragment containing the entire HAT cDNA was cloned into the 
transfer vector pBlueBacIII (Invitrogen, San Diego, CA) to generate 
pBacPHATl. Recombinant HAT-expressing viruses were generated af- 
ter co-transfection of Sf9 cells with pBacPHATl and wild-type AcMNPV 
DNA essentially as described by the manufacturer (Invitrogen). For 
baculovirus/insect cell expression (26), 800 ml of Tn5 (27) cells were 
then infected with the high titer lysate for 72 h and harvested by 
centrifugation. The cell pellet was treated with 1% Triton X-100 for 1 h 
on ice and was centrifuged at 100,000 x g for 1 h at 4 *G. From this 
infected cell lysate, the recombinant HAT was isolated by sequential 
chromatographic procedures of the native HAT purification described 
previously (18). SDS-polyacrylamide gel electrophoresis, immunoblot- 
ting, and degradation of fibrinogen by HAT were done as described (18) 

Northern Blot Analysis — The expression level of HAT mRNA in var- 
ious human tissues was examined by Northern blot analysis. To pre- 
pare the probe for the analysis, the full-length cDNA for HAT was 
^^P-labeled by random priming (25) and hybridized as follows. Northern 
blots of various human tissues, which contained 2 pig of poly(A)^ RNA 
derived from various tissues in each lane, were probed under the same 
conditions as the library screening described above (except that the 
concentration of SDS was 0.5%) and then washed. In the case of the blot 
for trachea, 2 ftg of human trachea poly(A)^ RNA was resolved by 1% 
agarose-formaldehyde gel electrophoresis (28), and transferred onto 
Hybond''"^'-N+ blotting membrane and UV-cross-linked. X-ray films 
were exposed to the probed blots for 4 days at —80 "C with an intensi- 
fying screen, and the presence of HAT mRNA in each human tissue was 
evaluated. These blots were then stripped of the HAT cDNA probe by 
boiling in 0.5% SDS for 10 min and re-probed with ^^P-labeled human 
^-actin control probe as an internal standard for the amounts of RNA 
loaded. 

RESULTS AND DISCUSSION 

Cloning of HAT cDNA — Using a pair of highly degenerate 
oligonucleotide primers, the partial 59-bp cDNA fragment for 
HAT, which contained a nucleotide sequence coding for the 
NHg-terminal 19-residue amino acid sequence of the native 
HAT, was obtained by PCR amplification from human trachea 
cDNA. To stretch this cDNA sequence to the 3'-end, a 3'-RACE 
reaction was carried out. The resulting 0.9-kb amplified prod- 
uct was shown to encompass the entire nucleotide sequence of 
the 3' region, including the poly(A) tail of HAT cDNA (nucleo- 
tides 635-1517 in Fig. 1). The amino acid sequence deduced 
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GACTOXi^ATCTCAAAGCACrmylCr^AOOCAGAAAAAAGAACV 60 

AATgTATACgCCACCACXrroTAACTTOGACTTCW^ 12 0 

1 MYRPARVTSTS R I F I. N P Y V 

21 I P I V V A n V V T T. A V T I A I. T. V Y P 1 

immmilCATCAAAAATCTrACTTTTATA 240 
41 I L A Pi DQKSYFYRSSFQLLNVE 

ATATAATAG?CACTTAAATTCACCACCTACACAGGAATACAGGACTTTC»^ 300 
61 YNSQLNSPATQEYRTLSCRl 

TGAATXriXrKyVTTACTAAAACATTCAAAGAATCAAATTT^ 360 
61 ESLITKTFKB5NLRNQFIRA 

101 HVAKLRQDGSGVRADVVMKP 

TCAATTCACTAGAAATAACAATGGACCATCAATGAAAACXAGAATTCAGTOT 480 
121 QFTRNNNCASMKSRIESVLR 

ACAAATCCrrcAATAACTCTKAAACCTCXSAAAlW^CCCTT^AA^ SdO 
141 QWLNNSGNLEINPSTBITSL 

TACroACCACGCTGCACCyu^ATO3GCTTATTAATGAATCyreO0(^^ 600 
161 TDQAAANWLINECGAGPDLI 

AACATTGTCTGAGCAGAGAATCCTTCCAGGCACTCW3GCTGAGGAGGGAAG<^^ 660 
181 T L S fi Q R ILGG TEAB R G S W P W 

GCAAGTCACTCTCKXXMrrcAATAATCXXCACCACTGTOGAGCC^^ 720 
201 Q V S L R L HNAHHCGGSL.INNM 

CTGGATCCTCACACCAGCTCACTGCTTCAGAAOCAACTCTAAT^^ 780 
221 WILTAAHCFRSNSNPRDWIA 

CACXnxriGGTATTTXXrACAACATTrcCTAAACTAAGAATGAGAGTAAGAAATAT^^ 840 
241 TSGISTTFPKLRMRVRNILI 

TCATAACAATrATAAATCrrGCAACTCAT3AAAATClACATTGCJW:ri^^ 900 
261 HNNYKSATHENDIALVRLBN 

CAGTCTCACCTTTArcAAAGATATCCAmrKnOTCriC^ 960 
281 SVTFTKDIHSVCLPAATQNI 

TC»CClGGCTCTACTGCrTATGTAACAGGATGGGGCGCTCAAG^ 1020 
301 ppGSTAYVTCWGAQEYAGHT 

AGTTXX:AGACXrrAAGGCAAGGACACWTCAGAATAATAAGTAATGATCTATG'?AATGW 1080 
321 VPBLRQGQVRIISNDVCNAP 

ACATAGTTATAATCWAGCCATCTTCTrCTGGAATGCTGTGTGCTGGAGTAC^^ 1140 
341 HSYNGAILSGMLCAGVPQGG 

AGTXXJACGC ATXrrc AGGGTGACTCTGGTT3GCCCACTAGTAC AAGAAG AC^ 1200 
361 VDACQGDSGGPLVQEDSRRL 

TTCGTTTA'ITCTCGGG ATACT AAGCTGGCGAGATCAGTCTGGCCTOCCGGATAA 1260 
381 WFIVGIVSWGDQCGLPDKPG 

AGTGTATACTCGAGTG ACAGCCrACCTTCACTGGATTAGGC AACAAACTGGGATC^ 1320 
401 VYTRVTAYLDWIRQQTGI • 

CAACAACrroCATCXX7n7ITXXyUU«riCTGT^ 1380 

CTrTACATTTCAACTCAAAAAGAAACrrAGAAATGTCCTAATTTAACATC^^ 1440 

ATA UXX r i ' IU 'AACaAACACTCTTrAACCTITCTTTArrATTAAAGGl^^ ISOO 

AAAAAAAAAAAAAAAAA 1^1'^ 

Fig. 1. Nucleotide sequence of HAT cDNA and its deduced 
amino acid sequence. The nucleotide sequence of the HAT cDNA is 
shown along with the deduced amino acid sequence beginning with the 
first ATG codon. A stop codon (TAG) at the terminus of the translation 
sequence is marked with an asterisk. Nucleotides are numbered at the 
right margin and amino acids on the left. The NHg-terminal sequence 
obtained from the purified enzyme is underlined. The boxed amino acid 
sequence represents a potential transmembrane domain. 

from this 0.9-kb fragment was shown to exactly contain the 15- 
amino acid sequence (amino acids 192-206 in Fig. 1) of the 
NHg-terminal 20-amino acid sequence of the native HAT. With 
this 0.9-kb cDNA fragment as a probe, 1 X 10® clones of a 
human trachea AgtlO cDNA Ubrary were screened. Five of 28 
independent positive clones were then subcloned and se- 
quenced. The largest insert was shown to contain a 1323-bp 
sequence of cDNA (nucleotides 133-1455 in Fig. 1) but was 
considered not to contain the entire nucleotide sequence of the 
5' region of HAT cDNA. To obtain the missing sequences in the 
5' region of HAT cDNA, 5 '-RACE reaction was carried out. The 
5 '-RACE procedure produced a 741-bp cDNA fragment (nucle- 
otides 1-741 in Fig. 1). This product had a 609-bp nucleotide 
sequence overlapping (nucleotides 133-741 in Fig. 1) with the 
5 '-end of the largest insert of cDNA clone obtained by the cDNA 
library screening. 

Sequence and Structural Features of HAT cDNA — Analysis 
of the cDNA clones obtained by the successive procedures in- 
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Fig. 2. Immunoblotting of the native HAT and the recombi- 
nant HAT, Specific binding was analyzed using the antibody against a 
peptide corresponding to the NHa-terminal 19 amino acids of HAT as 
described previously (18). Lane 7, standard proteins; lane 2, purified 
native HAT (0.10 tig); lane 3, lysate of infected Tn5 cells derived from a 
20-^1 culture; and lane 4, purified recombinant HAT (0.10 ^g). 

eluding 3'-RACE, cDNA Hbrary screening, and 5'-RACE 
showed a 1517-bp nucleotide sequence up to the poly(A) region 
(Fig. 1), which represented the HAT cDNA sequence. This 
nucleotide sequence was also shown to contain one open read- 
ing frame, and the polypeptide deduced from the cDNA in- 
cluded the 20-residue amino acid sequence of the NH2 terminus 
of the native HAT (amino acids 187-206 in Fig. 1). The molec- 
ular mass of the polypeptide, including the NHg terminus of the 
20 residues to the COOH terminus deduced from the stop codon 
TAG (nucleotide- 13 16), was estimated to be 25,308 Da. This 
value is similar to the apparent molecular mass (27 kDa) esti- 
mated by gel filtration of the native HAT protein purified from 
sputum (18). 

In the 5 '-flanking region of this cDNA, one in-frame stop 
codon TAG was located at nucleotide 26. Four in-frame ATG 
codons were detectable between this stop codon and the region 
encoding the native HAT, but none of these ATG codons satis- 
fied the criteria for a Kozak consensus sequence (29). Therefore 
we could not determine the translational initiation site in the 
cDNA from the nucleotide sequence. To determine the initia- 
tion site, we expressed recombinant HAT in a baculovirus/ 
insect cell system using the HAT cDNA. The recombinant virus 
containing the HAT cDNA was isolated, and the insect cell Tn5 
was infected with the virus and then cultured. The lysate 
obtained by 1% Triton X-100 treatment of the infected cells was 
analyzed by immunoblotting with a rabbit antibody against a 
peptide corresponding to the NHg-terminal 19-amino acid se- 
quence of the native HAT (18) as primary antibody, and the 
immunoblotting indicated that the infected cells biosynthe- 
sized a protein with a molecular mass of 48 kDa as a main 
product (Fig. 2). The molecular mass of each polypeptide, de- 
duced from the nucleotide sequence initiating from each of 4 
ATG codons in the cDNA, was 46,263, 32,933, 31,436, and 
30,107 Da, respectively. The molecular mass of 46,263 Da is the 
most similar to that of the recombinant protein expressed in 
the insect cells, suggesting that the ATG located nearest the 
5'-end (at nucleotide 62) is the initiation codon of HAT. 

To demonstrate that the cloned enzyme has the same activity 
as the native HAT, the recombinant HAT that was expressed in 
the baculovirus/insect cell system was isolated in its active 
form. The minor product in Fig. 2, lane 3 was isolated selec- 
tively as the active recombinant HAT from the infected cell 
lysate by sequential chromatographic procedures of the native 
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HAT purification (18). The purified recombinant enzyme has 
the molecular mass of 28 kDa on SDS-polyacrylamide gel elec- 
trophoresis and the identical 10 NHa-terminal residues to the 
native HAT. Immunoblotting also showed the purified recom- 
binant enzyme as same size as the native HAT (Fig. 2). The 
recombinant HAT had an enzymatic activity degrading fibrin- 
ogen, especially the a-chain (Fig. 3), similar to the native HAT. 
From these results, it was established that the isolated cDNA 
clone encodes HAT. 

Based on these results, the nucleotide sequence of the cDNA 
for HAT (Fig. 1) was summarized as follows. The cDNA in- 
cludes 1254 nucleotides coding for 418 amino acids and two 
untranslated nucleotide sequences composed of 61 and 185 
nucleotides at the 5'-end and 3 '-end, respectively. In the 3'- 
untranslated region, there is a polyadenylation signal se- 
quence, ATTAAA, at nucleotides 1478-1483, 17 nucleotides 
distant from the poly(A) tail. 

Analysis of Deduced Amino Acid Sequence of HAT — The open 
reading frame of HAT cDNA was thought to encode a polypep- 
tide consisting of 418 amino acid residues, thus having the 
molecular mass of 46,263 Da. The NHg-terminal 20-amino acid 
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Fig. 3. Degradation of human fibrinogen by the native HAT 
and the recombinant HAT. Hydrolyzing reaction and SDS-polyacryl- 
amide gel electrophoresis were done as described previously (18). For 
each reaction, 0.10 /xg of HAT was used. Lane 1, standard proteins; lane 
2, fibrinogen (blank control); lane 5, fibrinogen hydrolyzed by native 
HAT; lane 4, fibrinogen hydrolyzed by recombinant HAT. 
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Fig. 4. Hydropathy plot of the de- 
duced amino acid sequence of HAT. 
The method of Kyte and Doolittle (22) was 
used with averaging over a window of 10 
residues. Hydrophobic residues show pos- 
itive values, whereas hydrophilic residues 
show negative values. Amino acid num- 
bering begins with the start codon Met. 
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sequence of the native HAT extends from Ile^^'^ to Leu^°® in the 
sequence of the deduced polypeptide (Fig. 1). This result indi- 
cates that the Arg^^^^-Ile^^*^ peptide bond in the HAT polypep- 
tide should be cleaved for activation of HAT. This type of 
cleavage has been shown to be a relatively common step for 
activation of many known serine protease zymogens (30, 31). 
Therefore it is likely that the HAT gene product is synthesized 
as a precursor protein that consists of a noncatalytic region 
with 186 amino acid residues (20,955 Da, amino acids 1-186 in 
Fig. 1) and a catalytic region with 232 amino acid residues 
(25,308 Da, amino acids 187-418 in Fig. 1) and that the pre- 
cursor is converted to an active enzyme by limited proteolysis 
like trypsinogen to trypsin in the small intestine (32). In this 
noncatalytic region, there were two potential iV-linked glycosy- 
lation sites, namely Asn-Asn-Ser and Asn-Pro-Ser, at Asn^'*'* 
and Asn^^^, respectively. 

A hydropathy plot (22) of the predicted amino acid sequence 
of HAT precursor (Fig. 4) showed that a typical NHg-terminal 
signal sequence (33—35) is not present, but a single obvious 
hydrophobic region (amino acids 13-43 in Fig. 1) is present 
near the NHg terminus. This hydrophobic region consisting of 
31 amino acid residues does not contain any charged amino 
acids and is flanked by charged amino acids (Arg*^ and Asp'*'*). 
This internal hydrophobic region is thought to correspond to a 
transmembrane domain that anchors the protein to the cell 
membrane (36), A generalized rule in the eucaryotic transmem- 
brane proteins (37, 38) suggests that the difference in total 
charge between 15-residue sequences on either side of the 
membrane-spanning hydrophobic region determines the orien- 
tation of the protein, with the more positive side facing the 
cytosol. As for the precursor polypeptide deduced from HAT 
cDNA, the NHg-terminal side of the hydrophobic region had a 
net charge of + 3, whereas the opposite side had that of + 1. The 
charge on the NH2-terminal side was +2, as positive as that on 
the COOH-terminal side. This result suggests that HAT pre- 
cursor has an intracellular NH^-terminal tail region consisting 
of 12 amino acid residues facing the cytosol and an extracellu- 
lar COOH-terminal region consisting of 375 amino acid resi- 
dues and containing the catalytic region. Therefore, the HAT 
precursor can be classified as a type H integral membrane 
protein (39, 40) and is thought to be synthesized as a mem- 
brane-bound precursor protein translocated to the cell surface, 
processed to a soluble form, and released. 

Because neither the precursor nor intermediate form of HAT 
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Fig. 5. Comparisons of the deduced 
amino acid sequence of the catalytic 
portion of HAT with those of other 
serine proteases. Identical amino acid 
residues are shaded, and the catalytic 
triad of histidine, aspartic acid, and ser- 
ine are indicated by triangles. Hyphens 
represent gaps to bring the sequences to 
better alignment. 
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187:IL66TEAEEGsSinp(?«Sli RLN^m^q^GPI^iNl^LTS^FRSNSNP-RDW-I 

: rvCGRDTSLGRWgWQ^^ij RYDGAHLC§($i!L SGDWVLTA^tEFPERNRVLSRWRV 

: S NrtiiKNAKEC>#l^W(Hl- -YYGGR- - L- 

: MSKAAQHGA,W©'i?^QIFRYNSHRYHTC'^^ FVOCNNVHDVWL 
: IvCSQEAPRStOOTQV^- -R\mORYW>«FCGGS^IHPWLTWyCL -GPOV- -KD- - L 



240 : ATSGISTTFPK-LRMRVRNILIHNNY K-SATHE- -NCMALVRL ENSVTFTKOIHSV 

FAGAVAQASPHGLQLGVQAVVYHGCYLPF-ROPNSEENSNOljatLVHLSSPlPLTEYIQPV 
AILGLHMKSNLTSPQTVPRLIDEIVINP---HYNRRRKDNDIAMMHLEFKVNYTOYIQPI 
VFCAKEITYCNNKPNflaPLQERYVEKIIIHEKYNSATEGNDlXLVEITPPISCGRFIGPG 
ATLRVI^GTHLYYQDQLLP-VSRI^M^P- - -QFYIIQTGADlia 

292 : Ct'PAATQNIPP(i-STAYVTj^G-AQE YAG-KTVPBER(?G(WIISNDVC- -N- APHS^,- - 
:a;PAAGQALVD§-KICTVTGWG-NTTJYYG-(WAG\flifQEAR^ 
:<tFEEI^VFPP^RNCSIAGWGTWYQGT-TANI-)|QEADSPL 
: a'WFKAGL PRGSQSCWA^SViffiYI E EKAP-RPSSIBIEARyDL IDLDL CNS - 
: MtfePASETFPP'C-MPCWNA'GiSDVDNDE PLPPPFPt'KQVKVPIM 
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- NCAI - LSGML^AGVPQGGVbAC<3iGl>SGG^^^^ 
-GNQI-KPKWfflL(^rPEGGIDAe^bi^^^ 

-N— IT- ErWIGAI^YE EGGlbsc(QG656^%MC — QEN— NRWFLA'GVT^FCYKGALPNR 
- — R VQPTNV-GAGYPVGKIffrgoiSDSGtJ^ — KDSKE SAYVW^IT|W6VGGALAKR 
OOVRI IRDDMLgAjJ- - NSQRgSQ^gSGGlL VC- - KVM— GTWlQASiW^WDE AQPNR 

399 : ^"GV3^TRVTAYL0P-RQQTGI - 

P6VYTKVSDFREWI- FQAIKTHSEASGMVTQL 

|?G\^ARVSRFTEWl-QSFLH - 

PGimTWPYLNWiASKIGSNALRMIQSATPPPPTTRPPPIRPPFSHPISAHLPWYFQPP 
IglYTRVTYYLOll-KHYVPKKP 

:PRPLPPRPPAAQPPPPPSPPPPPPPPASPLPPPPPPPPPTPSSTTKLPQGLSFAKRLQQL 
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: lEVLKGKTYSDCKNHYDMETTELPELTSTS 




Fig. 6. Northern blot analysis of 
HAT mRNA in various human tis- 
sues. The blots were hybridized to HAT 
cDNA probe {upper panel). The same fil- 
ters were re-hybridized with j3-actin 
probe as an internal standard for the 
amounts of RNA loaded {lower panel). 



has been isolated and characterized, it is unknown whether or 
not the membrane-bound HAT is active on the cell surface. The 
mechanisms of expression and activation of many serine pro- 
teases have been clarified. The predicted maturation process of 
HAT precursor described above is similar to that of the Bacillus 
amyloliquefaciens subtilisin (41). The subtilisin is synthesized 
as a membrane-associated precursor (preprosubtilisin) and re- 
leased outside the cell after it is autocatalytically converted to 
an active form (42). Only mature subtilisin has been detected 
extracellularly (41). Active HAT contained in sputum samples 
was also detected extracellularly. 

It is possible that the membrane-bound HAT or the portion 



remaining in the membrane afler release of the soluble HAT 
may be involved in some important physiological processes on 
the cell surface through interaction with ligands, other pro- 
teins, or the surface. Recent reports have shown that some 
viruses and a bacterial toxin utilize cell surface proteases as 
receptors (43—47), indicating other usage in addition to intrin- 
sic roles of these proteins. 

Homology of Amino Acid Sequence of HAT with Other Pro- 
teases — To find any similarity in the primary structure be- 
tween HAT and known proteins, we surveyed publicly avail- 
able data banks. Previous investigators have shown that the 
serine protease family has a common catalytic site consisting of 
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three amino acid residues. His, Asp, and Ser, joined by hydro- 
gen bonds to display catalytic action as a catalytic triad, al- 
though they are located apart from each other in the primary 
structure of the enzyme (48). Based on these estabUshed facts, 
the catalytic site of HAT is thought to consist of amino acid 
residues His^^'', Asp^"^^, and Ser^®® (Fig. 5). In comparison of 
the amino acid sequence of HAT with those of other serine 
proteases, the most striking similarity was found around this 
putative catalytic triad as shown in Fig. 5. Six of seven cysteine 
residues in the catalytic region of HAT were at identical posi- 
tions as those of other serine proteases (Fig. 5). Nine cysteine 
residues were contained in the deduced polypeptide of HAT 
precursor, and the Cys^° was located in the predicted trans- 
membrane domain. Based on the locations of the known disul- 
fide bridges in other serine proteases (49), it is postulated that 
the other eight cysteine residues may form four disulfide bonds, 
which are located at cysteine pairs 212/228, 337/353, and 364/ 
393 in the catalytic region and at 173/292 between the non- 
catalytic region and the catalytic region. 

It was shown that the amino acid sequence of the catalytic 
region of HAT was homologous to that of the other human 
serine proteases: 38% identity with hepsin (50), 32% with en- 
teropeptidase (51), 30% with acrosin (52), and 29% with mast 
cell tryptase (53). Hepsin, of which the catalytic region shows 
the highest similarity with that of HAT in this survey, is a cell 
surface protease widely expressed in various tissues including 
liver and is suggested to play a role in cell growth and main- 
tenance of cell morphology (54). 

On the other hand, the amino acid sequence of the noncata- 
lytic region of HAT showed no significant similarity with those 
of other proteins and had neither kringle nor an EGF-like 
domain, which are found in soine kinds of proteases relating to 
the blood coagulation, fibrinolysis, and complement cascades 
(55). The function or roles of this unique and relatively long 
noncatalytic portion of HAT precursor are unknown. 

Northern Blot Analysis — Previously, we showed immunohis- 
tochemically that HAT protein was expressed in the cells of 
submucosal serous glands of human bronchi and trachea (18). 
Serous glands are widely distributed in various human tissues. 
Therefore multiple tissue Northern blot analysis was carried 
out to confirm that HAT mRNA was expressed in the human 
lower airway and also to clarify whether or not HAT mRNA 
was expressed in human other tissues. As shown in Fig. 6, a 
1.9-kb transcript was detectable in only the trachea blot among 
the 17 different types of tissues examined, such as heart, brain, 
pancreas, lung, and liver. The mRNA size is in fairly good 
accordance with that (1517 bp) of the HAT cDNA established in 
the present work. In addition to the 1.9-kb mRNA, 3.0-kb and 
0.9-kb signals were weakly detectable in the trachea and pan- 
creas blot, respectively. These two transcripts may appear as 
result of an alternative splicing/polyadenylation process or rep- 
resent a cross-hybridizing mRNA, but the nature of these tran- 
scripts is unknown. These results strongly suggest that HAT 
mRNA is more actively expressed in the lower airway including 
trachea than in the other tissues examined and support our 
previous result that HAT is localized in cells of submucosal 
serous glands of trachea and bronchi. 

Although the native HAT was found in the sputum of pa- 
tients with chronic airway diseases, HAT mRNA is thought to 
be expressed in the normal tissues of healthy subjects, because 
the trachea poly(A)'^ RNA subjected to the Northern blot was 
obtained from the normal trachea tissues of three white male 
subjects who died of trauma or acute heart failure. It will be 
useful to compare expression levels of mRNA and protein of 
HAT in the patients with airway diseases with those in healthy 
subjects to clarify the physiological and pathophysiological sig- 



nificance of HAT in the airway. In the airway, various kinds of 
proteins such as lysozyme (56), secretory IgA (57), and secre- 
tory leukocyte protease inhibitor (58) are secreted from the 
submucosal serous glands onto mucous membrane and become 
constituents of airway mucous or bronchial secretions (59). 
These proteins play impori-ant roles in the host defense system 
of airways together with respiratory mucous glycoproteins, 
which are secreted from mucous glands cells and goblet cells 
(59). HAT may be released from the serous glands with these 
proteins and play some biological role in the host defense sys- 
tem on the mucous membrane independently of or in coopera- 
tion with other substances in airway mucous or bronchial 
secretions. 

In summary, it was confirmed through the present work that 
HAT is a novel trypsin-like serine protease by analyzing the 
primary structure of the polypeptide deduced from the nucleo- 
tide sequence of its cDNA. However, the mechanism of activa- 
tion of the HAT precursor to mature enzjrme, the physiological 
role of the enzyme, and biological significance of the noncata- 
lytic region of the precursor remain to be resolved. 
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A novel cDNA has been identified ^om human heart 
that encodes an unusual mosaic serine protease, desig- 
nated corin. Corin has a predicted structure of a type II 
transmembrane protein and contains two frizzled-like 
cysteine-rich motifs, seven low density lipoprotein re- 
ceptor repeats, a macrophage scavenger receptor-like 
domain, and a trypsin-like protease domain in the extra- 
cellular region. Northern analysis showed that corin 
mRNA was highly expressed in the human heart. In 
mice, corin mRNA was detected by in situ hybridization 
in the cardiac myocytes of the embryonic heart as early 
as embryonic day (E) 9.5, By Ell.5-13.5, corin mRNA was 
most abundant in the primary atrial septum and the 
trabecular ventricular compartment. Expression in the 
heart was maintained through the adult. In addition, 
mouse corin mRNA was also detected in the prehyper- 
trophic chrondrocytes in developing bones. By fluores- 
cent in situ hybridization analysis, the human corin 
gene was mapped to 4pl2-13 where a congenital heart 
disease locus, total anomalous pulmonary venous re- 
turn, had been previously localized. The unique domain 
structure and specific embryonic expression pattern 
suggest that corin may have a function in cell differen- 
tiation during development. The chromosomal localiza- 
tion of the human corin gene makes it an attractive 
candidate gene for total anomalous pulmonary venous 
return. 



Serine proteases are essential for a variety of biological proc- 
esses including food digestion, complement activation, and 
blood coagulation (1-3). In Drosophila, serine proteases are 
also involved in developmental pathways. For example, serine 
proteases encoded by the nudel, gastrulation defective, easier, 
and snake genes are key components of a proteolytic cascade 
that is critical for the establishment of the dorsal-ventral pat- 
tern in developing embryos (4-6). Genetic defects in these 
genes often lead to the disruption of the dorsal-ventral axis, 
resulting in embryonic lethality (7). 

Most serine proteases of the trypsin family are secreted 
proteins. Several members from this family have been identi- 
fied that contain an integral transmembrane domain. Hepsin, 
for example, is a serine protease expressed on the surface of 
hepatocj^es. Structurally, hepsin is a type II transmembrane 
protein with the transmembrane domain at its amino terminus 
and the protease domain at the carboxyl terminus exposed to 
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the outside of the cell (8). In tissue culture studies, hepsin weis 
shown to contribute to hepatocyte growth (9). However, the 
physiological significance of the growth stimulating activity of 
hepsin remains unknown (10). In Drosophila, Stubble-stub- 
bloid protein, another transmembrane serine protease, shares 
structural similarities with hepsin (11). Genetic studies dem- 
onstrated that Stubble-stubbloid is essential for epithelial mor- 
phogenesis and development of the fruit fly. Defects in the 
Stubble-stubbloid gene cause malformation of legs, wings, and 
bristles. Most recently, other transmembrane serine proteases 
were isolated and cloned from human trachea and small in- 
testine (12, 13). The biological function of these newly discov- 
ered membrane-bound serine proteases has not yet been 
determined. 

In this study, we report the cloning of a cDNA from the 
human heart that encodes a novel transmembrane serine pro- 
tease, designated corin. Corin has a predicted structure of a 
type II transmembrane protein containing two frizzled-like 
cysteine-rich motifs, seven LDL^ receptor repeats, a macro- 
phage scavenger receptor-like domain, and a trypsin-like pro- 
tease domain in the extracellular region. In situ hybridization 
revealed that corin mRNA was expressed in the embryonic 
heart as early as E9.5, and the expression in the heart was 
maintained through the adult stage. In addition, corin mRNA 
was detected in prehypertrophic chrondrocytes of the develop- 
ing bones. The unusual domain structures and specific expres- 
sion pattern suggested that corin may have a function in cell 
differentiation during embryonic development. 

EXPERIMENTAL PROCEDURES 

Materials — Human cancer cell lines, HEC-l-A (endometrium adeno- 
carcinoma), U2-OS (osteosarcoma), SK-LMS-1 (vulva sarcoma), RL95-2 
(endometrium carcinoma), and AN3-CA (endometrium adenocarci- 
noma) were obtained from the American Type Culture Collection 
(ATCC). Human heart cDNA libraries and human and mouse multiple 
tissue Northern blots were purchased from CLONTECH (Palo Alto, 
CA). Mouse tissue sections used for in situ hybridization were pur- 
chased from Novagen (Madison, WI). Tissue culture media and supple- 
ments were from Life Technologies Inc. All other chemicals were ob- 
tained from Sigma. 

Isolation of Human Corin cDNA Clones — An expressed sequence tag 
(EST) clone was found in a human heart cDNA library from the Incyte 
EST data base that shared significant sequence homology with trypsin, 
indicating that the EST may encode a novel serine protease gene. A 
2.1-kb EcoRl-Xhol insert from this EST clone was used to screen a 
human heart cDNA library (CLONTECH). Approximately, 5 x 10® 
lambda phage clones were screened, and two positive clones were iso- 
lated that contained inserts of 3.5 and 3.1 kb, respectively. The DNA 
sequences of these two clones were determined. Oligonucleotide prim- 



^ The abbreviations used are: LDL, low density lipoprotein; EST, 
expressed sequence tag; FISH, fluorescent in situ hybridization; 
GAPDH, glyceraldehyde-3-phosphate dehydrogenase; ORF, open read- 
ing frame; RT, reverse transcriptase; PCR, polymerase chain reaction; 
RACE, rapid amplification of cDNA ends; TAPVR, total anomalous 
pulmonary venous return; kb, kilobase pair; bp, base pair; E, embryonic 
day. 
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ers were designed to clone further 5' end cDNA sequences by 5' rapid 
amplification of cDNA ends (RACE) using Marathon-ready human 
heart cDNA templates (CLONTECH). The PGR products from 5' RACE 
were cloned into pCRIl vector (Invitrogen, San Diego, OA) and se- 
quenced. Oligonucleotide primers used in the 5' RACE experiments 
were 5 -CAGTTGGTTTGAACAAGTGCAGGG-3', 5 -TGCAAGGAGG- 
GATACGCTCGCCTG-3', 5'-AATCCCAAGAACAGACTCACAGCG-3', 
5'-CGGGTCACAGAGAGAGCTACCACC-3', 5'-GGTCTCCTTCTTGA- 
CATGAATCTG-3', 5'-CGGAGCCCCATGAAGTTAAACCA-3', and 5'- 
AACAAAAGGATCCTTGGAGGTCGGACGAGT-3'. The final 5' end se- 
quence of human corin cDNA was derived from at least three 
independent clones. The full-length cDNA sequence was compiled using 
the Genetics Computer Group (GCG) software (version 9.1, Madison, 
WI). 

Northern Analysis — Northern blots containing poly(A)^ RNA sam- 
ples (2 ixgAane) from multiple human and mouse tissues were pur- 
chased from CLONTECH. Human and mouse corin cDNA probes were 
labeled with [^^PJdCTP using a random primed DNA labeling kit (Roche 
Molecular Biochemicals). Northern hybridization was performed at 
42 'C overnight in a solution containing 40% formamide, 5x Denhardt's 
solution, 6x SSC, 100 ptg/ml salmon sperm DNA. and 0.1% SDS. Blots 
were washed with 0.2 X SSC, 0.1% SDS at 60 "C and then exposed to 
Fuji imaging plates. As a control, the blots were reprobed with a human 
actin cDNA probe provided by CLONTECH. 

RT-PCR—mRNA samples were isolated from Hec-l-A, U2-OS, SK- 
LMS-1, and AN3-CA cells using a commercial RNA preparation kit 
(Oligotex Direct mRNA Mini Kits, Qiagen). First strand cDNAs were 
synthesized using Superscript 11 RNase" reverse transcriptase (Life 
Technologies Inc.). Human corin-specific oligonucleotide primers (sense 
primer, 5'-AACAAAAGGATCCTTGGAGGTCGGACGAGT-3', and anti- 
sense primer, 5'-CGGAGCCCCATGA AGTTAATCCA-3') were used to 
amplify a 630-bp fragment of corin cDNA between nucleotides 2475 and 
3105. Oligonucleotide primers TFRl (5'-GTCAATGTCCCAAACGT- 
CACCAGA-3') and TFR2 (5'-ATTTCGGGAATGCTGAGAAAACAGA- 
CAGA-3'), derived from the human glyceraldehyde-3-phosphate dehy- 
drogenase (GAPDH) gene, were used as an internal quantification 
control. PCR reactions were performed with a thermal cycler (Perkin- 
Elmer, model 480). PCR products were separated on 1% agarose gels 
and visualized by ethidium bromide staining. 

In Situ Hybridization — Mouse adult heart and embryonic tissue 
sections were deparaffinized in xylene, rehydrated, and fixed in 4% 
paraformaldehyde. The tissues were digested with proteinase K (20 
^g/ml), then treated with triethanolamine/acetic anhydride, and dehy- 
drated. An 800-bp mouse corin cDNA fragment from the coding region 
was cloned into pCRII (Invitrogen) in two orientations to yield plasmids 
pMll and pM41. The plasmids were linearized by Hfndlll digestion. 
Sense and antisense probes were synthesized using T7 RNA polymer- 
ase (T7/SP6 transcription kit, Roche Molecular Biochemicals) and la- 
beled with [^^P]UTP (Amersham Pharmacia Biotech). The hybridiza- 
tion was carried out as described (14). The slides were dehydrated and 
dipped in Kodak NTB-2 emulsion and exposed for 4 weeks in light-tight 
boxes at 4 "C. Photographic development was carried out in a Kodak 
D-19 developer. The slides were stained with hematoxylin/eosin and 
analyzed using both light- and dark-field optics of a Zeiss microscope. 

Fluorescent in Situ Hybridization (FISH) Analysis — Pi phage clones 
containing the human corin gene were isolated by filter hybridization 
using a human corin cDNA as the probe. One clone was confirmed by 
DNA sequencing using a primer from human corin cDNA. The DNA 
fragment from this PI phage was labeled with digoxigenin-dUTP. The 
labeled probe was combined with sheared human DNA and hybridized 
to metaphase chromosomes derived from PHA-stimulated peripheral 
blood lymphocytes in a solution containing 50% foTTnamide, 10% dex- 
tran sulfate, and 2x SSC. Hybridization signals were detected by flu- 
orescent-labeled antidigoxigenin antibodies and counter-staining with 
4,6-diaminoidino-2-phenylindole. A total of 80 metaphase cells were 
analyzed of which 74 cells exhibited specific labeling. 

Homology Model of the Protease Domain of Corin — A model of the 
corin protease domain (amino acids 802-1042) was built based on the 
structure of bovine chymotrypsinogen A at 1.8-A resolution (15, 16), 
using the homology program (Insight 11, 1995, MSI, San Diego, CA). 
Rotamcrs were used for non-identical side chain replacements (16). 
Coordinates for the loop insertions were extracted from the Brookhaven 
protein data bank (17). The model was refined by energy minimization 
using the AMBER force field (Discover 95.0), with a distance-dependent 
dielectric constant. The minimization used the steepest descents and 
conjugate gi-adient methods as follows: first for the loops only where 
insertions and deletions occuiTed, then side chains, and a final round of 
minimization keeping the Ca atoms fixed. The residues of corin (His**^, 



Asp®*^, and Ser^®**) corresponding to the catalytic triad of the template 
structure were also held fixed. 

RESULTS 

Cloning of the Full-Length Human Corin cDNA — A computer 
search using the BLAST program identified an EST clone from 
a human heart library that shared significant homology with 
serine protease family members, such as trypsin. The EST 
clone was used to isolate the full-length cDNA of a novel gene, 
designated corin for its abundant expression in the heart. The 
sequence of the full-length corin cDNA, 4933 bp in length, is 
shown in Fig. 1. The size of the cDNA is consistent with the 
length of corin raRNA (^5 kb) detected by Northern analysis 
(Fig. 4A). An ATG codon is located at position 95 that may 
represent the translation initiation site. The open reading 
ft-ame (ORF) spans 3126 bp with a 5 '-untranslated region of 94 
nucleotides before the initiation codon. At the 3' end, there is a 
1.7-kb 3 '-untranslated region after the stop codon at position 
3221. A polyadenylylation signal of AATAAA is present 12 
nucleotides before the poly(A)^ tail. 

The Domain Structure of Human Corin — The ORF of the 
human corin cDNA encodes a polypeptide of 1042 amino acids 
with a calculated mass of 116 kDa. At the amino terminus of 
the predicted corin protein, there is no discernible signal pep- 
tide sequence. Hydropathy plots using the GCG program iden- 
tified a highly hydrophobic region between amino acids 46 and 
66 (Fig. 2B). This hydrophobic sequence could serve as a po- 
tential transmembrane domain. There are positively charged 
amino acid residues immediately preceding the putative trans- 
membrane segment, suggesting that corin is a type 11 trans- 
membrane protein with the amino terminus present in the 
cytosol (18). Consistent with this hypothesis, there are 19 pre- 
dicted N-linked glycosylation sites present in the extracellular 
domains of corin (Fig. 1). 

Analysis of the corin protein sequence showed that in the 
extracellular region there are two frizzled-like cysteine-rich 
domains, seven LDL receptor repeats, one macrophage scav- 
enger receptor-like domain, and one trypsin-like serine prote- 
ase domain (Fig. 2A). As shown in Fig. 2A, two frizzled-Hke 
cysteine-rich domains are located at amino acids 134-259 and 
450-573, respectively. Amino acid sequences of these two do- 
mains share significant similarities with the extracellular cys- 
teine-rich domain of the Drosophila Frizzled protein, a seven- 
transmembrane receptor essential for polarity determination 
during the development of the fi:-uit fly (19). The firizzled-Uke 
cysteine-rich domains have also been found in other proteins, 
such as Dfz2 in Drosophila (20), Lin- 17 in Caenorhabditis 
elegans (21), and FZ-1 in human (22). The sequences of the two 
frizzled-like cysteine-rich domains in corin are closest to those 
in Lin-17 and FZ-1. As shown in Fig. 2C, all the 10 conserved 
cysteine residues are present in the frizzled-Uke cysteine-rich 
domains of corin. 

Between amino acids 268-415 and 579-690 (Fig. 2, A and 
D), there are seven cysteine-rich repeats homologous to the 
LDL receptor class A repeats (23). Each repeat is about 36 
amino acids long and contains six cysteine residues as well as 
a highly conserved cluster of negatively charged amino acids. 
In the LDL receptor, these cysteine-rich repeats bind calcium 
ions and play an essential role in endocytosis of the extracel- 
lular ligands (23). Similar motifs have been found in the extra- 
cellular domain of other membrane receptors, such as LDL 
receptor-related protein (LRPl) (24), megalin (also known as 
LRP2 or gp330) (25), complement proteins (26), enterokinase 
(27), and Drosophila proteins yolkless and nudel (28, 29). 

In addition to the fi^izzled-like cysteine-rich domains and 
LDL receptor-like repeats, there is another cysteine-rich region 
between amino acids 713 and 801 in corin (Fig. 2, A and E). 
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X AAAT^TCCCTACTGCCTCCCCGG(»CACACCTACACCMIACAAJULCCGACCAACA 60 

61 ACTtM»CAGAAOAATAAGCGAGACTTTTTATCX»rGAAACACTCTCCTGCCCTCGCTCCS 12 0 

MKOSPALAP 9 

131 CAA(^GCCC7ACCGCACACCCCCCTCCCCJIAAGCXX»TCTTGACACCTCATGACAATAAC IBO 

10BEBrRRAGSPKPVI.BAODI)K 39 

IBl ATGCGCAATGGCTGCTCTCAGAXCCTCCCGACTGCTAACCrrcxnTCaJTTCCTATTGCTC 210 

SOHCUGCSQKLATANLLR F L L L 49 

241 GTCCTGATTCCATGTATCTGTOCTCTCOrTCTCrTCCTCCTCATCCTGCTTTCCTATGTT 300 

50 ^i.iPCT.CKi.vi.Li.wii.\. s y V 69 

301 CGAACATTACAAAAGGTCTAnTTAAATCAAATCGCACTGAACCTTTGCTCACTGATGGT 360 

TCGTLQKVYFKSnCSEPLVTDG 89 

361 GAAATCCAAGGGTCCGATCTTATTCTTACAAATACAATTTATAACCAGAGCACTCTCGTC 4 20 

90EIOGSOVILrNTIYaOSTVV 109 

4 21 TCTACTCCACATCCCGACCAACACGTTCCAGCCTGGACTACGGATGCrTCTCTCCCAGCG 480 

lie STAHPDQUVPAHTTOASLPG 129 

4B1 GACCAAAGTCACAGGAATACAAGTGCXrrGTATGAACATCACCCACAGCCAGTGTCAGATG 540 

130 DQSHR£TSACH]«ITUSCCOM 149 

541 CTCCCCTACCACGCCACGCTGACACCTCTCCTCTCAGTTGTCAGAAACATGGAAATGGAA 600 

150 LPYHATLTPLLSVVBMHEHB 169 

601 AACTTCCTCAA&TTTTTCACATATCTCCATCGCCTCACTTGCTATCAACATATCATCCTG 660 

ITCKFLKPFTYLHRLSCrOBIML 189 

661 TTTGCCTGTACCCTCCCCTTCOCTGAGTGCATCATTGATGGCGATCACAGTCATGGACTC 720 

190 PGCTLAPPEC I IDGOOSHGL 209 

721 CTGCCCTGTAGGTCCTTCTGTCAGGCTGCAAAAGAAGGCTGTGAATCAGTCCTGGGGATG 780 

210 LPCRSPCEAAK£GCESVLGH 339 

7B1 CTCAATTACTCCTGGCCGGATTTCCTCAGATCCTCCCAGTTTAGAAACCAAACTGAAAGC 84 0 

230 VfiYSWPDFLRCSOPRnOTBS 249 

641 ACCAATGTCAGCAGAATTTGCTTCrCACCTCACCAGCAAAACGGAAAGCAATTGCTCTGT 900 

250 S£VSRICFSPOOElilGKOLLC 269 

901 GGAAGGGGTGAGAACTTTCTGTGTGCCAGTGGAATCTGCATCCCCGGGAAACTGCAATGT 960 

270 GRG ENFLCASG t C I PGK LOG 3S9 

961 AATG6CTACAACGACTGTGACGACTGCAGTGACGAGGCTCATTGCAACTGCAGCCACAAT 1020 

290 NGYHOCDDWSDEAHCUCS EH 309 

1021 CTGTTTCACTGTCACACAGGCAACTGCCTTAATTACAGCCTTGTGTGTGATGGATATGAT 1080 
310 LFUGHTGKCLg^S^^C^C^^ 

10 81 gactgtggggatttgagtgatgagcaaaactctgattgcaatcccacxacagagcatcgc 1140 

330 dcgolsobqnc:dch?ttbbr 349 



1141 TGCGGGGACGGCCGCTCCATCGCCATGGACTGGGTGTGTGATGGTGACCACGACTGTGTG 1200 

350 CGOCRCZAHEWVCDCDBDCV 369 

1201 GATAAGTCCGACGAGGTCAACTGCTCKTGTCACAGCCAGGGTCTGGTGCAATGCAaAAAT 1260 

370 DK3DEVBCSCHS0GLVECRH 389 

1261 GGACAATGTATCCCCAGCACGTTTCAATGTGATGGTGACGAGCACTGCAAGGATGGGAGT 1320 

390 GOCIPSTFOCDGDEDCKDGS 409 

1321 CATCAGGAGAACTGCAGCGTCATTCACACTTCATGTCAAGAAGGACACCAAAGATCCCTC 1380 

410 DEEUCSVIOTSCOBCDORCL 439 

1 3 B 1 TACAATCCCTGCCTTGATTCATGTGGTGGTAGCTCTCTCTCTGACCCG AACAACAGTCTG 14 4 0 

430 VNPCLDSCGGS3LCDPHHSL 449 

1441 AATAACTGTAGTCAATGTGAACCAATTACATTGGAACTCTGCATQAATTTCCCCTACAAC 1500 

450 NNCSQCEPITLELCMNLPYH 469 

1501 AGTACAAGTTATCCAAATTATTTTGGCCACAGGACTCAAAAGGAAGCATCCATCAGCTGG 1560 

470 STSYPHyPGHRTORCAS I SM 489 

1561 GAGTCTTCTCTTTTCCCTGCACTTGrrCAAACCAACTGTTATAAATACCTCATGTTCTTT 1620 

490 ESSLFPALVQTMCYKYLMFF 509 

1621 TCTTGCACCATTTTGCTACCAAAATGTGATGTGAATACAGGCGAGCGTATCCCTCCTTGC 1680 

510 SCTILVPKCDVNTGERI PPC 529 

1681 AGGGCATTGTGTGAACACTCTAAAGAACGCTGTGAGTCTGTTCTTGGGATTGTGGGCCTA 1740 

530 RALCER5KERCBSVLCIVGL 549 

1741 CAGTGGCCTGAAGACACACATTGCAGTCAATTTCCAGaGGAAAATTCAGACAATCAAACC 1800 

550 OWPEDTDCSQFPEEHSDHQT 569 

1801 TOCCTGATGCCTGATCAATATGTGGAAGAATGCTCACCTACTCATTTCAAGTGCCGCTCA 1860 

570 CLMPDEYVEECSPSHFKCRS 589 

1861 GGACAGTGTCTTCTGGCTTCCAGAAGATGTGATGGCCAGCCCGACTGTGACGATCACAGT 1930 

590 CQCVLASRHCDGOADCDDDS 609 

1921 GATGAGGAAAACTGTGGTTGTAAAGAGAGAGATCTTTGGGAATGTCCATCCAATAAACAA 1980 

610 DEENCGCXERSLWeCPSMKO 629 

1981 TGTTTGAAGCACACAGTGATCTCCCATCCCTTCCCAGACTGCCCTGATTACATGCACGAG 204 0 

630 CLKHTVICDGFPDCPDYHDE 6*9 

2041 AAAAACTGCTCATTrrGCCAAGATGATGAGCTGGAATGTGCAAACCATCCGTGTCTGTCA 2100 

650 KgCSFCODDELECANHACVS 669 

2101 CGTGACCTCTGGTGTCATGCTGAAGCCGACTGCTCAGACAGTTCAGATGAATGGGACTCT 3160 

670 RDLHCDGBADCSDSSDBHDC 689 

2161 GTGACCCTCTCTATAAATGTGAACTCCTCTTCCTTTCTGATGGTTCACAGAGCTGCCACA 2220 

690 VTLSIMVflSSSFLMVOHAAT 709 

3231 OAACACCATGTGTGrGCAGATGGCTGCCAGGAGATATTGACTCAGCTGGCCTGCAAGCAG 33 BO 

710 BBHVCADGWQEZ LSQLACKO 729 



3281 ATCCCTTTACCACAACCATCTGTGAOCAAATTGATACACCAACAGCACAAAGAGCCGCCC 33 40 
730 HGLGCPSVTKLIQEQBKBPR 749 

2341 TGGCTGACATTACACTCCAACTGCGAGAGCCTCAATGGGACCACTTTACATGAACTTCTA 34 00 
750 HLTLBSHHBSL^GTTLH ELL 769 

3401 CTAAATGGGCACTCTTCTCAGAGCACAAGTAAAATrrCTCTTCTCTGTACTAAACAAGAC 34 60 
7 70 VHG0SCBSRSKX6LLCTKOI> 789 

3461 TCTGGGCGCCGCXXrrGCTCCCCaAATCAACAAAACGATCCrTCCAGGTCCCACGACTCCC 3530 
790 CGRRPAAHMNKR^XLGCRTSR B09 

3521 CCTGGAAGGTGGCCATGCCACTGTTCTCTGCAGAGTGAACCCACTGGACAIATCTGTGGC 3580 
810 PGRUPHQC6L08BP8GI1 fCG 839 

3 581 TGTGTCCTCATTGCCAACAACTGGGrrCTGACAGTTCCCCACTGCTTCGACGGGAGACAC 3640 
830 CVLI AKRHVLTVA|{CrEGRE B49 

3 641 AATGCrrcaVCTTTGGAAAGTGGTCCTTGGCATCJUlCAATCTACACCATOCATCAGTGTTC 2703 
aSO HAAVHKVVL G I H HLDH P 6V T 869 

3701 ATGCACACACGCTTTGTGAACACCATCATCCTCCAnXXXX»rrACAGTCCJU;CAGTGGTC 3760 
870 MOTRFVKT IILRPRYSRAVV 889 

3761 GACTATGACATCAGCATCCTTGAGCTCACTCAAGACATCAGTGAGAC7GCCTACGTCCGG 3830 
890 DYj^ISIVELSEDISBTGYVR 909 

3831 CCTGTCTOCTTGCCCAACCCGGAGCACTGGCTAGAGCCTGACACGTACTGCTATATCACA 3880 
910 PVCLPNPBQWLBPOTYCYIT 929 

3881 GGCTGGGGCCACATGGGCAATAAAA7GCCATTTAAGCTGCXAGACGGAGAGGTCCGCATT 3940 
930CWCHMCHKMPFRI.OECEVRI 949 

3941 AlTTCTCTGGAACATTCTCAGTCCTACTTTGACATGAAGACCATCACCACTCXiGATGATA 3000 
930 ISLEHCQSYFDHKTITTRMZ 969 

3001 TGTGCTGOCTATGACTCTGGCACAGTTCATTCATGCATCGCTCACACCCCTCCCCCTCTT 3060 
970 CAGYESGTVDSCHGDSGGPL 989 

3061 GTTTGTGAGAAGCCTCGAGGACGCTCGACATTATTTCCATTAACTTCATGCCCCTCCGTC 3130 

990 VCEKPCGRHTLFGLTSHCSV 1009 

3121 TGCrrTTCCAAAGTCCTGGGGCCTGGCGTTTATAGTAATGTGTCATATTTCGTCGAATGG 3180 

1010 CFSKVLGPCVYSflVSYrVEW 1039 

3161 ATTAAAAGACAGATTTACATCCAGACCTTTCTCCTAAAC TAA TTATAAGQATGATCAQAG 3240 
1030 IKROIYIQTPLLN* 

3241 ACTTTTGCCAGCTACACTAAAAGAAAATGGCCrrCTTGACTGTGAAGAGCTGCCTGCAGA 3300 

3301 GAGCTGTACAGAAGCACTTTTCATCGACAGAAATGCTCAATCCTGCACTCCAAATTTGCA 3360 

3361 TGTTTGTTTTGGACTAATTTTTTTCAATTTATTTTTTCACCTTCATTTTTCTCTTATTTC 3420 

3421 AAGTTCAATGAAAGACTTTACAAAAGCAAACAAAGCAGaCTTTGTCCTTTTGCCAGGCCT 34 80 

3481 AACCATGACTGCAGCACAAAATTATCGACTCTGGCGAGATTTAAAATCAGCTGCTACAGT 3540 

3541 AACAGCTTATGGAATGGTCTCTTTTATCCrATCACAAAAAAAGACATAGATArTTAGGCT 3600 

3601 CATTAATTATCTCTACCAGTTTTTGTTTCTCAACCTCACTCCATAGTGGTAAA7TTCAGT 3660 

3661 GTTAACATTGGAGACTTGCrrrrCTTTTTCTTTTTTTATACCCCACAATTCTTTTTTATT 3720 

3721 ACACTTCGAATTTTAGGGTACACGAGCACAACGTGCACGTTAGTT ACATATCTATACATG 3780 



3781 TGCCATGTTGGTCTGCTGAACCCAGTAACTCGTCArTTGATTTATTAAAAGCCAAGATAA 3840 

3841 TTTACATGTTTAAAGTATTTACTArr ACCCCCTTCTAATGTTTGCATAATTCTGAGAACT 3900 

3901 GATAAAAGACAGCAATAAAACACCAGTGTCATCCATTTAGGTAGCAAGACATATTGAATG 3960 

3961 CAAACTTCTTTAGATATCAATATTAACACTTGACATTATTGCACCCCCCATTCTCGATCT 4020 

4 021 ATATCAAGATCATAATTTTATACAAGAGTCTCTATAGAACTGTCCTCATAGCTGGGTTTG 4080 

4081 TTCAGGATATATGAGTTGGCTGATTGAGACTGCAACAACTACATCTATATTTATGGGCAA 4140 

4141 TATTTTGTTTTACTTATGTGCCAAAGAACTGCATATTAAACT-rTGCAAAACAGAATTTAC 4200 

4 201 ATGAGAGATGCAATTTTTTAAAAAGAAAATTAATTTGCATCOCTCGTTTAATTAAATTTA 4 260 

4 261 TTTTTCAGTTTTCTTGCGrTCATCCATACCAACAAAGTCATAAAGACCATATTTTAGAGC 4320 

4 321 ACAGTAAGACTTTGCATGGAGTAAAACATTTTGTAATTTTCCTCAAAAGATGTTTAATAT 4 380 

4JH1 CTGGTTTCTTCTCATTCGTAATTAAAATTTTAGAAATGArrT7TAGCTCTAGGCCACTTT 4440 

4 441 ACCCAACTCAATTTCTCAAGCAATTAGTGGTAAAAACTATTTTTCCCCACTAAAAAACTT 4 500 

4 501 TAAAACACAAATCTTCATATATACTTAATTTAATTAGTCAGGCATCCATTTTGCCTTTTA 4 560 

4 561 AACAACTAGG ATTCCCTACTAACCTCCACCACC AACCTGGACTGCCTCAGCATTCCAAAT 4620 

4 621 AGATACTACCTGCAATTTTATACATGTATTTTTGTATCTTTTCTGTGTGTAAACATAGTT 4 680 

4 681 GAAATTCAAAAAGTTGIACCAATTTCTATACTATTCATCTCCTCTCCTTCAGTTTGTATA 4740 

4741 AACCTAAGGAGAGTGTCAAATCCAGCAACTGAATTGTGGTCACGATTCTATCAAAGTTCA 4800 

4 801 AGAACATATGTCAGTTTTGTTACACTTCTAGCTACATACTCAATGTATCAACTTTTAGCC 4860 

4 861 TGCTCAACTTAGGCTCAGTGAAATATATATATTATACTTATTTTAAATAATTCTTAATAC 4920 

4 921 AAATAAAATGGTA 4933 



Fig. 1. Nucleotide sequence of human corin cDNA and its deduced amino acid sequence. The potential codon for the initial methionine, 
the translation stop codon, and the polyadenylylation signal were in bold-face type and underlined. The putative transmembrane domain was 
double underlined. The 19 potential AT-linked glycosylation sites are in boldface type and double underlined. An arrow indicates the putative 
cleavage site for the activation of the serine protease. The active site residues of the catalytic triad (His^^, Asp*®*, and Ser®®°) are in boldface type 
and underlined. 
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Fig. 2. A, a schematic presentation of 
the domain structure of corin protein. The 
transmembrane domain {TM), frizzled- 
like cysteine-rich domains (CRD), LDL 
receptor repeats (LDLR)^ scavenger re- 
ceptor cysteine-rich domain (SRCR), and 
serine protease catalytic domain (Catalyt- 
ic) are indicated. Numbers correspond to 
the amino acid residues of the ORF shown 
in Fig. 1. B, hydropathy plots of the de- 
duced amino acid sequence of corin by 
Goldman and Kyte-Doolittle methods, re- 
spectively (36). H phobic y hydrophobic; 
Hphilic^ hydrophilic, C, alignment of 
amino acid sequences of the frizzled-like 
cysteine-rich domains from corin and 
other members of the frizzled family, in- 
cluding Frizzled in Drosophilay lin-17 in 
C. elegans, and FZ-1 in human. D, align- 
ment of amino acid sequences of the seven 
LDL receptor repeats of corin with the 
consensus sequence derived from the hu- 
man LDL receptor. E, alignment of amino 
acid sequences of the scavenger receptor- 
like cysteine-rich domains from corin and 
human enterokinase (Entk), sea urchin 
speract receptor (q 17064) and human 
scavenger receptor I (ol5393). Asterisks 
indicate conserved residues. F, alignment 
of amino acid sequences of protease do- 
mains from human corin, prekallikrein 
(KAL), enterokinase (ENTK), trypsin 
(TRPl), and bovine chymoti-ypsinogen A 
(CTRAX 
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Fig. 3. Molecular model of the protease domain of corin be- 
tween amino acids 802 and 1042. A corin model was built based on 
the structure of bovine chymotrypsinogen A, as described under **Ex- 
perimental Procedures," The active site residues of the catalytic triad 
(His**"*"^, Asp"^'^, and Ser*""*) are shown in purple. Four disulfide bonds in 
the corin model (Cys^=***-Cys*"**, Cys^^'^-Cys"*'. Cys®'^^-Cys^®\ and 
Cys"3i_Qygioioj ii^Qi correspond to the disulfide bonds in the catalytic 
domain of chymotrypsinogen (Cys''^-Cys*'^^, Cys^'^^-Cys^*'^, Cys'^^- 
Cys^"', and Cys****-Cys^'*") are shown in blue. The side chains of Cys^^' 
and Cys®^** of the corin model are in an acceptable proximity to form a 
disulfide bond (pink). The distance between the C-a atoms from the 
chymotrypsinogen template (Val^* and Gly'") corresponding to these 
two cysteine residues is 5.08 A, and the distance between the sulfur 
atoms after rotamer searching of the cysteine side chains is about 2.5 A. 
The potential disulfide bond between Cys^°° and Cys**^^ of corin corre- 
sponding to the disulfide bond between Cys^ and Cys^^^ of chymo- 
trypsinogen is not included in the model. 



This region contains 88 amino acids and is homologous to the 
cysteine-rich motif found in the macrophage scavenger receptor 
(30). This motif is also present in the sea urchin spermatozoa 
speract receptor (31, 32) and the vertebrate serine protease, 
enterokinase (27), 

At the carboxyl terminus of corin protein between amino acid 
residues 802 and 1042, there is a trypsin-like serine protease 
domain (Fig. 2A). This protease domain is highly homologous to 
the catalytic domain of members of the trypsin superfamily. 
For example, amino acid sequence identities between corin and 
prekallikrein (33), factor XI (34), and hepsin (35) are 40, 40, 
and 38%, respectively. All essential features of serine protease 
sequences are well conserved in corin (Figs. 1 and 2F), The 
active site residues of the catalytic triad are located at His®"*^, 
Asp®®^, and Ser^®^. The amino acid residues forming the sub- 
strate specificity pocket are located at Asp^^^, Gly^°°^, and 
Olyiois ^Thege residues are predicted to bind the substrate PI 
residues, suggesting that corin would cleave its substrate after 
basic residues, such as lysine or arginine. In addition, a puta- 
tive activation cleavage site was found at Arg^*^^ suggesting 
that corin would be synthesized as an inactive zymogen and 
that another trypsin-like enzyme was required for its 
activation. 

In the protease domain, there are 12 cysteine residues. Po- 
tential pairing of these cysteine residues can be predicted by 
comparing with other well studied serine proteases, such as 
trypsin and chymotrypsin. First three pairs of cysteine resi- 
dues present in essentially all members of the trypsin super- 
family are located at Cys^^S-Cys^^, Cys^'^^-Cys^'^°, and 
Cys^^^-Cys'^'O, Two more pairs of cysteine residues are pres- 
ent at the positions Cys^^*'-Cys^^''^ and Cys^^®-Cys®^\ These 
two pairs of cysteine residues are commonly found in a sub- 
family of two-chain serine proteases, such as chymotrypsin and 
prekallikrein (33). The presence of Cys^^° and Cys^^^ indicated 
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Fig. 4. Northern analysis of corin mRNA expression. Human 
and mouse multiple tissue Northern blots were hybridized with human 
and mouse corin cDNA probes, respectively. In human tissues (A and 
B), corin mRNA was detected only in samples from heart. In mouse 
tissues (C), abundant expression of corin mRNA was detected in sam- 
ples from heart. Weak signals were also detected in samples from testis 
and kidney. 

that, after the activation cleavage at Arg®°^, the catalytic do- 
main of corin would remain attached to the rest of molecule by 
a disulfide bond. Interestingly, there is one additional pair of 
cysteine residues, Cys®^^ and Cys®*^°, present in corin. Cysteine 
residues at these two positions were not found in any other 
serine proteases in vertebrates. A search of data bases showed 
that a chymotr3rpsinogen-like serine protease from the lug- 
worm, Arenicola marina ^ had two cysteine residues at the 
corresponding positions.^ A model of the corin protease domain 
was built based on the structure of bovine chymotrypsinogen A 
(Fig. 3), Based on this corin model, where the C-« atoms of 
these two cysteine residues were held fixed during energy 
minimization, the distance between the sulfur atoms of their 
side chains is about 2.5 A after rotamer searching. The model 
indicates that these two cysteines are likely to form a disulfide 
bond connecting two /3-sheets in the core of the protease do- 
main (Fig. 3). 

Northern Analysis of Corin niRNA Expression — To deter- 
mine expression of the corin gene in human tissues. Northern 
hybridization was performed using human corin cDNA probes. 
As shown in Fig. 4A, an —S-kb transcript was detected only in 
the heart but not in other tissues including brain, placenta, 
lung, liver, skeletal muscle, kidney, pancreas, spleen, thymus, 
prostate, testis, ovary, colon, and leukocytes. Since the heart is 
mainly composed of cardiac muscles. Northern analysis was 

^ J. Eberhardt, GenBank'^" accession number G1160388. 
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Fig. 5. Analysis of cor in mRNA ex- 
pression by in situ hybridization in 
an adult mouse heart. Tissue sections 
from atrium (B) and ventricle (A) were 
stained with hematoxylin/eosin. Conn 
mRNA was detected by in situ hybridiza- 
tion using a mouse corin cDNA probe. Ex- 
pression of conn mRNA was found in the 
cardiac myocytes of both the atrium (D) 
and the ventricle (C> as shown by white 
spots. 








Fig. 6. Expression of corin mRNA in the developing heart. Tissue sections were prepared from mouse embryos at day E9.5 (A and B\ El 1.5 
(C and D\ E12.5 (E and F), and E15.5 (G-J) and stained with hematoxylin/eosin (A, C, E, G, and /). Corin mRNA expression was detected by in 
situ hybridization in developing heart by E9.5 (B) and El 1.5 (O) as indicated by arrows. The expression was prominent in the primary atrial septum 
and the trabecular ventricular compartment by E12.5 (F). By E15.5, corin mRNA was detected in most cardiac myocytes in both atrium iff) and 
ventricle (J). Abbreviations used in E, G, and / are as follows: Atr^ atrium; V, ventricle; Ar, aorta; Vc, vena cava; E, esophagus; Lu, lung. 



performed to examine the presence of corin mRNA in other 
human muscle-rich tissues. Again, corin mRNA was detected 
in the heart but not in uterus, small intestine, bladder, stom- 
ach, and prostate (Fig. 4B). 

To examine corin mRNA expression in mice, the full-length 
mouse corin cDNA was cloned by a PCR-based strategy. Mouse 
corin cDNA shared 89% sequence identities with human corin 
cDNA (data not shown). Northern analysis was performed with 
RNA samples from mouse tissues. As shown in Fig. 4C, a 
prominent transcript of ^6 kb was detected in samples derived 



from the heart. In contrast to Northern analysis with human 
samples, low levels of corin mRNA were also detected in sam- 
ples derived from the testes and kidneys. 

Mouse Corin mRNA Expression in Adult and Embryonic 
Hearts — In situ hybridization was performed to determine the 
temporal and special expression of corin mRNA. In adult mice 
(Fig. 5), corin mRNA was detected in cardiac myocytes of both 
atrium and ventricle. The level of expression appeared to be 
higher in the atrium than the ventricle. During embryonic 
development, corin mRNA was first detected at E9.5 in both 
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Fig. 7. Expression of corin mRNA in 
other tissues during embryonic de- 
velopment. Tissue sections were stained 
with hematoxylin/cosin. In situ hybridiza- 
tion was performed uning a mouse corin 
cDNA probe, as described under "Experi- 
mental Procedures." A and B, expression 
of corin mRNA in cartilage primordia of 
vertebral bodies of an E13.5 embryo. C 
and i), expression of corin mRNA in the 
turbinate primordium around the nasal 
and eye cavities of an E15.5 embryo. E 
and F, expression of corin mRNA in a 
developing digital bone in a front paw at 
E15.5. Corin mRNA was detected in the 
region adjacent to the hypertrophic chon- 
drocytes and in the perichondrocytes. G 
and H, in a more matured digital bone in 
a hind limb of an E15.5 embryo, a similar 
pattern of corin mRNA expression was 
found in the region adjacent to the hyper- 
trophic chrondrocytes and in the peri- 
chondrocytes. / and J, expression of corin 
mRNA in the medulla of a developing kid- 
ney at E15.5. /iC and L, expression of corin 
mRNA in the decidual cells of a pregnant 
uterus. Abbreviations used are: V, verte- 
bral bodies; N, nasal cavity; E, eye cavi- 
ties; Hy, hypertrophic chondrocytes; P, 
perichondrocytes. 




atrium and ventricle of the developing heart (Fig. 6B). Between 
El 1.5 and E13.5, corin mRNA was highly expressed in the 
thickened atrial wall and in the regions that underwent tra- 
beculation in the ventricle (Fig. 6, D and F). By E15.5, corin 
mRNA in the heart was more abundant, especially in primary 
atrial septa (Fig. 6H). Weak signals appeared to be present in 
developing aorta and vena cava but not in the esophagus and 
lungs (Fig. SH). The expression of corin mRNA in the heart was 



maintained in the subsequent embryonic stages (not shown). 

Corin mRNA Expression in Other Tissues — In addition to the 
heart, corin mRNA was also detected in other mouse tissues by 
in situ hybridization. For example, corin mRNA was present in 
the uterus of pregnant mice and in the developing kidneys. In 
the uterus (Fig. 7D, corin mRNA expression was most abun- 
dant in the decidual cells close to the implantation site of the 
embryo. In the developing kidneys at E15.5, corin mRNA was 
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Fig. 8. Analysis of corin mRNA expression in tumor cell lines 
by RT-PCR. RNA samples were isolated from human tumor cell lines. 
RT-PCR experiments were performed using oligonucleotide primers 
derived from human corin cDNA. Corin mRNA was detected in samples 
from Hec-l-A, U2-OS, SK-LMS-1, RL95-2, and AN3-CA cells (upper 
panel, lanes 2-6) but not in samples from HeLa cells {upper panel, lane 
i). In a control experiment, PGR reactions were performed with specific 
oligonucleotide primers for the human GAPDH gene. GAPDH mRNA 
was detected in samples from all cell lines (lower panel, lanes 1-6). 

highly expressed in the stromal cells in the medulla but not in 
the cortex of the kidney (Fig. IJ). This finding was consistent 
with the results of Northern analysis in which a corin tran- 
script was found in RNA samples from mouse kidneys (Fig. 3C). 

Interestingly, in situ hybridization also identified corin 
mRNA in several cartilage-derived structures, such as the ver- 
tebra in the tail, the turbinate in the head, and the long bones 
in the limbs (Fig. 7, A and H). Fig. IB showed the 
expression of corin mRNA in cartilage primordia of vertebral 
bodies in the posterior of an E13.5 embryo. By E15.5, the level 
of corin mRNA expression in the vertebra was much lower as 
the vertebra became more matured (data not shown), indicat- 
ing that corin may play a role in the differentiation of chondro- 
cytes. This notion was supported by the expression of corin 
mRNA in developing limbs. Fig. 7, E and F, showed an early 
developing digital bone that consisted of three types of cells as 
follows: hypertrophic chondrocytes at the center, prehypertro- 
phic chondrocytes next to the hypertrophic zone, and prolifer- 
ating chondrocytes at the both ends. Corin mRNA was found 
mostly in the prehypertrophic chondrocytes (Fig. IF). Hybrid- 
ization signals were also present in perichondrium (Fig. IF). 
Fig. 7, G and H, showed a long bone in a hind limb that was at 
a more advanced developmental stage. The central hyper- 
trophic zone was replaced by vascularized tissues containing 
bone marrow cells and osteroblasts. Nevertheless, similar ex- 
pression pattern of corin mRNA was found in the narrow zone 
of the prehypertrophic chondrocytes and in the perichondrium. 
These results indicated that corin expression was associated 
with a specific stage of chondrocyte differentiation. 

Corin mRNA Expression in Human Tumor Cell Lines — A 
number of human cancer cell lines were screened by Northern 
and RT-PCR analyses for the presence of corin mRNA. In most 
cell lines, such as HL60, HeLa, K562, MOLT-4, RAJI, SW480, 
A549, and G36, corin mRNA was undetectable (data not 
shown). However, corin mRNA was found in several cell lines 
derived from uterus tumors or osteosarcoma. As shown in Fig. 
8, corin mRNA was detected by RT-PCR in endometrium car- 
cinoma cell lines HEC-l-A, AN3 CA, and RL95-2, leiomyosar- 
coma cell line SK-LMS-1, as well as in osteosarcoma cell line 
U2-0S. The result is consistent with the finding by in situ 
hybridization in which corin mRNA was highly expressed in 
the developing bones in embryos as well as in the maternal 
uterus. 

Chromosomal Localization of the Human Corin Gene — FISH 
analysis was performed to determine the chromosomal locus of 
the human corin gene. Specific fluorescent spots were found at 
4pl2-13, a region adjacent to the centromere on the short arm 
of chromosome 4 (Fig. 9). The result was confirmed in a subse- 
quent experiment in which a genomic probe previously mapped 
to 4pl5.3 was co-localized with the corin gene probe (data not 



shown). A search of the OMNI human genetic data base indi- 
cated that a congenital heart disease locus, total anomalous 
pulmonary venous return (TAPVR), was previously mapped to 
this region at 4pl3-ql2 (37). 

DISCUSSION 

In this study, we describe the cloning and initial character- 
ization of a novel cDNA from the human heart that encodes a 
putative transmembrane serine protease, which we have des- 
ignated as corin. The presence of a hydrophobic transmem- 
brane domain at its amino terminus and the absence of a signal 
peptide suggest that corin is a type II transmembrane protein. 
In the extracellular region of corin, there is a trypsin-like 
catalytic domain that contains all conserved structural fea- 
tures of serine proteases, such as the catalytic triad, the acti- 
vation cleavage site, the substrate specificity pocket, and the 
essential cysteine residues. Interestingly, the protease domain 
of corin contains two unique cysteine residues, Cys®^^ and 
Qyg83o^ that are not present in other trypsin-like serine pro- 
teases in vertebrates. Molecular modeling showed that these 
two cysteine residues are likely to form a disulfide bond con- 
necting two i3-sheets in the core of the protease domain (Fig. 3). 
A search of genomic data bases showed that a chymotrypsin- 
like protease found in the lugworm, A. marina, also has two 
cysteine residues at the corresponding positions. It is not clear 
whether these two cysteine residues are maintained through a 
convergent or divergent evolution. Nevertheless, the presence 
of such an unusual pair of cysteine residues in both corin and 
the lugworm protease suggests an important biological func- 
tion of the disulfide bond. One potential possibility is that the 
disulfide bond may contribute to stability of the proteases. 

Although members of the trypsin superfamily are known to 
contain a variety of domain structures such as kringle and 
epidermal growth factor-like domains that are important for 
protein-protein interactions, this is the first report of the pres- 
ence of a finzzled-like cysteine-rich domain in this extended 
family. Originally, the frizzled gene was identified in Drosoph- 
ila (38). The gene encodes a seven-transmembrane receptor 
that is required for proper development of hairs, bristles, and 
oramatidia of the fruit fly (19, 39). Later, other Frizzled pro- 
teins have been identified in many other species. They all 
contain a well conserved extracellular cysteine-rich domain 
and a seven-transmembrane domain and act as receptors for 
secreted Wnt glycoproteins (for review see Refs. 40 and 41). The 
cysteine-rich domain, which is about 120 amino acids in length 
and contains a motif of 10 invariantly spaced cysteine residues, 
has been shown to be necessary and sufficient for the binding of 
the Wnt ligands (20, 42). Recent studies demonstrated that 
Frzb, a secreted fi*izzled-like protein without the seven-trans- 
membrane domain, is expressed in the Spemann organizer of 
frog embryos and can bind and inhibit Wnt-8 (43, 44). In 
addition, similar frizzled-like cysteine-rich domains have also 
been found in several other proteins, including mouse collagen 
(XVIII) al chain (45), human carboxypeptidase Z (46), and 
several receptor tyrosine kinases (47-49). The function of the 
cysteine-rich domain in these proteins has not been deter- 
mined. Corin is unique in that it contains the ft4zzled-like 
cysteine-rich domains and a serine protease domain. The pres- 
ence of fi:4zzled-like domains in corin implies that corin may 
play an important role in development by directly interacting 
with Wnt proteins. 

The temporal and special pattern of corin gene expression 
further supported a potential developmental function of corin. 
In mice, corin mRNA was detected in the cardiac myocytes of 
the embryonic heart as early as E9.5 (Fig. 6B). The expression 
was most prominent in the primary atrial septum and the 
trabecular ventricular compartment by El 1.5-13.5 (Fig. 6, D 
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Fig. 9. Chromosomal localization of the human corin gene by FISH. A fluorescent-labeled genomic DNA probe containing the human 
corin gene was hybridized to metaphase chromosomes derived from PHA-stimulated peripheral blood lymphocytes. Hybridization signals are 
shown as bright blue spots and indicated by white arrows Heft panel). The position of the corin locus on human chromosome 4 is illustrated in a 
diagram {right panel). 



and F), During this period, an active process of looping and 
remodeling takes place in the embryonic heart. As a result, 
outflow tracts are formed, and the original single tube-like 
heart is reorganized into a four-chambered structure. Growth 
factors, such as bone morphogenic proteins and the transform- 
ing growth factor- family members, are known to play a crit- 
ical role during the embryonic heart development (50). Recent 
studies in Drosophila showed that the wingless iwg) gene, a 
homologue of the wnt oncogene in mammals, is directly in- 
volved in heart formation (51). It has been suggested that 
similar signaling pathways also contributed to the heart devel- 
opment in veri;ebrate (52). It is possible that corin could par- 
ticipate in such developmental pathways by interacting di- 
rectly with Wnt proteins or other growth factors. 

In addition to the heart, corin mRNA was identified in other 
tissues, such as the pregnant uterus and developing kidneys 
and bones. The expression of corin mRNA in these tissues 
appeared to be cell type-specific. For example, in developing 
long bones corin mRNA was specifically expressed in the pre- 
hypertrophic chrondrocytes. It is known that skeletal bones are 
derived from two different processes, intramembranous and 
endochondral ossification. In the former case, mesenchymal 
tissues are directly converted into bones, whereas in the latter 
case the mesenchymal cell is converted to bone via cartilage as 
an intermediate step. The vertebrae, long bones, and certain 
ft*agments of skull are formed by endochondral ossification (53). 
In these bones, mesenchymal cells first become chondrocytes 
that in turn differentiate from proliferating chondrocytes to 
prehypertrophic chondrocytes and finally to hypertrophic chon- 
drocytes- The hypertrophic chondrocjt-es eventually undergo 
apoptosis followed by vascularization and ossification. This 
process of chondrocyte differentiation has been shown to be 
tightly regulated by hedgehog proteins, bone morphogenic pro- 
teins, and parathyroid hormone-related protein (54—57). The 
specific expression of corin mRNA in a subset of chondrocytes 
indicated that corin may also be involved in this cell differen- 
tiation process. 

Finally, by FISH analysis the human corin gene was located 
on the short arm of chromosome 4 (4pl2-13) (Fig. 9). A search 
of the OMNI human genetic data base showed that a disease 



locus, total anomalous pulmonary venous return (TAPVR), had 
been previously mapped to this region. TAPVR is a rare cya- 
notic form of congenital heart defects in which the pulmonary 
vein connected abnormally to the right atrium or one of the 
venous tributaries instead of the left atrium. The molecular 
mechanism responsible for this developmental defect in the 
heart is unknown. A linkage study of a large Utah- Idaho family 
that included 14 affected individuals localized the TAPVR locus 
to a 30-centimorgan interval on 4pl3-ql2 (37). The findings 
that the corin gene and the TAPVR locus are co-locaUzed on 
chromosome 4 and that corin mRNA is highly expressed in the 
embryonic heart, particularly in the region where outflow 
tracts were formed, suggest that corin is an attractive candi- 
date for the TAPVR gene. The isolation of the corin cDNA 
provided a useful tool to study further this intriguing 
possibility. 
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include the small glycine (a single hydrogen atom) and alanine, serine and 
threonine (with attached hydroxyls), and cysteine (with its sulfhydryl). 
Proline has a hydrocarbon side chain, but its conformational properties put 
it at corners and therefore often outside. 

Results of x-ray crystallography show these classifications by polarity 
and location to be valid iri general for soluble globular proteins. The struc- 
tures of myoglobin and hemoglobin, lysozyme, and cytochrome c all have 
buried hydrophobic side chains with hydrophilic side chains on the sur- 
face. Figure 1-11 shows the positions of all 104 side chains for horse heart 
cytochrome c. This is a protein with a heme group like myoglobin, but 
with an entirely different function. It is one of a chain of rholecules that 
transports electrons in the mitochondria. Hydrophobic side chains (col- 
ored) pack inside the molecule, especially against the left side of the heme 
ring,' and hydrophiUc side chains (grey) are distributed over the surface of 
the molecule. This is a clear example of one way in which sequence dic- 
tates folding. 

Other side chains have pronounced effects on three-dimensional con- 
formation, particularly pToline and the sulfur-containing cysteine . The 
side chain. of proline contains a portion of the main chain and thus tends to 
change the direction of the main chain. Proline is often used to .produce a 
bend in the protein chain, and many of the a helices in myoglobin and 
hemoglobin begin with a proline residue. The side chain — SH of cysteine 
can make a covalent — S — S — lirikage with a similar residue from another 
protein chain (Figure 1-12). After the protein chain has reached its optimal 
low-energy confdrrnation, the disulfide bonds can increase its stability. The 
enzyme ribonuclease contains four such disulfide bridges. If the — S— S — 
linkages are broken and the protein chain is made to unfold in the presence 
bf a, denaturing. agent, such as urea, would it refold when the denaturing 
chemicals were removed? Christian Anfinsen a^nd coworkers answered this 
question in the affirmative in the early 1960s with a classic set of experi- 
ments. 

We have seen that sequence determines folding, but, in fact, it does 
more than that. It determines a unique folding pattern. The importance of 
the folding pattern can be appreciated through a consideration of the pro- 
tein's function. Enzymes, for example, are molecular machines that operate 
with great precision on other molecules called substrates , Chymotrypsin is 
one of a class of pancreatic digestive enzymes that cuts other protein 
chains. The substrate is a polypeptide chain that is held on the surface of 
the enzyme so that a peptide bond can be cleayed. It is necessary that the 
substrate mesh with the enzyme in an exact lock-and-key fashion. In chy- 
motrypsin there is a specificity pocket that fits an aromatic ring side chain 
of the substrate. Immediately adjoining the specificity pocket is an active 
site that assists in cutting a peptide bond near the boundi aromatic ring. 
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CHAPTER 1 

PROTEINS: 
AN OVERVIEW 



^Figure 1-10 

The 20 amino acid side chains classified by their probable position in the pro- 
tein molecule. Three-letter and one-letter codes are given for each. The forms 
shown here are the most prevalent at pH 7, Note that histldine can play a dual 
role— neutral (as shown here) or positively charged. 
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DETAILED ACTION 

This application is a CIP of 09/657,986, now issued as U.S. Patent No. 
6,797,504. 

Continued Examination Under 37 CFR 1.114 

A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 
30, 2006. amending claims 1. 5, 12-13, and 113-114 and canceling claims 6-7, 9-10, 14. 
16, 18 and 137, has been entered. 

Claims 1-3. 5, 10-13. 19-20. 34-36, 40-46. 48-55, 108-109 113-116, 118-120 and 
122-126 are pending. Claims 1-3. 5, 10-13, 19-20. 34-36. 40-46, 48-55. 108-109 113- 
116, 11 8-1 20 and 122-126 are withdrawn. Claims 1-3. 5, 11-13, 19-20, 34-36. 40-42 
and 113-114 are under consideration. 

Priority 

Applicant's claim for domestic priority under 35 U.S.C. 1 19(e) is acknowledged. 
However, the provisional applications upon which priority is claimed fails to provide 
adequate support under 35 U.S.C. 112 for claims 11-13 and 34 of this application. 
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Provisional applications 60/179,982, 60/183.542, 60/213,124, 60/220,970 and 
60/234,840 fail to provide adequate support for polypeptides comprising the serine 
protease domain of MTSP1 . Provisional applications 60/1 79,982 and 60/1 83,542 
describe polypeptides related MTSP3 and provisional application 60/213,124, 
60/220,970 and 60/234,840 describe polypeptides related to MTSP4. 

Therefore, the effective filing date for purpose of prior art is the filing date of 
09/657,986, which is 9/8/2000. 

Response to Arguments 

Applicant's amendment and arguments filed on January 30, 2006, have been 
fully considered and are deemed to be persuasive to overcome the rejections previously 
applied. Rejections and/or objections not reiterated from previous office actions are 
hereby withdrawn. 

Claim Objections 

Claims 1 1-13 and 34 are objected for being drawn to non-elected subject matter. 
In response to the previous Office Action, applicants have traversed the above rejection. 
Applicants argue that claims 11-13 and 34 are directed to elected subject matter. Even 
though claims are drawn to MTSP1 , the elected subject matter, the claims are also 
drawn to non-elected subject matter, i.e. MTSP3 (SEQ ID N0:4), MTSP4 (SEQ Dl 
N0:6), MTSP6 (SEQ Dl NO: 12), corin, enteropeptidase, human ainA^ay trypsin-like 
protease , TMPRSS2. TMPRSS4. Hence the objection is maintained. 
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Claim Rejections - 35 USC §112 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

Claims 1-3. 5. 11-12. 13 and claims 19-20, 34-36, 40-42 and 113-114 depending 
therefrom rejected under 35 U.S.C. 112, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claims 1-3, 5. 11-12, 13 recite the phrase "substantially purified single-chain 
polypeptide". The metes and bounds of the phrase in the context of the above claims 
are not clear to the Examiner. It is not clear to the Examiner what is considered as 
"substantially purified" by the applicants. A perusal of the specification did not provide a 
clear definition for the above phrase. Without a clear definition, those skilled in the art 
would be unable to conclude if a polypeptide is a "substantially purified" polypeptide 
without knowing the metes and bounds of the phrase. Examiner requests clarification of 
the above phrase. 

Claim 1 and claims 2-3, 5. 11-13. 19-20, 34-36, 40-42 and 113-114 depending 
therefrom are rejected under 35 U.S.C. 112, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claim 1 recites the phrase "the MTSP protease domain or catalytically active 
fragment thereof is the only portion of the single-chain polypeptide from the MTSP". 
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The metes and bounds of the phrase in the context of the claim is not clear. It is not 
clear to the Examiner as to how one skilled in the art would identify a given amino acid 
sequence as being "from MTSP" or not being "from MTSP". Examiner has interpreted 
the claims broadly to mean that a "single-chain polypeptide comprising a MTSP 
protease domain or catalytically active fragment thereof is the only portion of the single- 
chain polypeptide from the MTSP" is a "single-chain polypeptide comprising a fragment 
consisting of a protease domain or a catalytically active fragment thereof. Examiner 
requests clarification of the above phrase. 

Claims 12-13 and claims 113-114 depending therefrom are rejected under 35 
U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and 
distinctly claim the subject matter which applicant regards as the invention. 

Claims 12-13 recite the phrase "protease domain has a sequence of amino acid 
residues set forth as amino acids 615-855 of SEQ ID NO:2" or "protease domain whose 
sequence of amino acid residues is set forth as amino acid residues 615-855 of SEQ ID 
N0:2". The metes and bounds of the phrase in the context of the claims are not clear. 
It is not clear to the Examiner if the recited amino acid sequence has the amino add 
sequence of SEQ ID NO:2 or is a representative member of a genus. Examiner 
suggests amending the phrase as "protease domain comprises amino acids 615-855 of 
SEQ ID N0:2" to clearly indicate that the protease domain has the amino acids 615-855 
of SEQ ID NO:2. 
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Claim 19-20 are rejected under 35 U.S.C. 112. second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

Claims 19-20 recite the phrase "free Cys", The metes and bounds of the phrase 
in the context of the above claims are not clear to the Examiner. It is not clear to the 
Examiner what is considered as "free Cys" by the applicants. A perusal of the 
specification did not provide a clear definition for the above phrase. Without a clear 
definition, those skilled in the art would be unable to conclude if Cys is "free". Examiner 
requests clarification of the above phrase. 

Claim 19 is rejected under 35 U.S.C. 112. second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claim 19 recites the phrase "exhibits proteolytic activity". The metes and bounds 
of the phrase in the context of the above claim are not clear to the Examiner. It is not 
clear to the Examiner either from the specification or from the claims as to what 
applicants mean by the above phrase. Examiner requests clarification of the above 
phrase. 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 
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Claims 1-3. 5. 9. 11. 19-20. 34-36, 40-42 and 1 13-114 are rejected under 35 
U.S.C. 112. first paragraph, as containing subject matter which was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventor(s). at the time the application was filed, had possession of the claimed 
invention. 

Claims 1-3, 5. 9, 19-20. 35-36, 40-42 and 113-114 are drawn to a polypeptide 
comprising a protease or catalytically active portion of type-ll membrane-type serine 
protease (MTSP) from any source. Claims 1 1 and 34 limit the MTSP polypeptide to a 
MTSP1 polypeptide from any source. Therefore, these claims are drawn to a genus of 
polypeptides having any structure. The specification only teaches four species, amino 
acids 615-855 of SEQ ID NO:2, amino acids of 205-437 of SEQ ID NO:4, amino acids 
of SEQ ID NO:6 and amino acids 217-443 of SEQ ID NO:1 1 . These species are not 
enough to describe the whole genus and there is no evidence on the record of the 
relationship between the structure of the above catalytically active protease domains of 
SEQ ID NOs: 2, 4. 6 and 1 1 and the structure of the serine protease domain of any or 
all MTSP polypeptides or MTSP1 polypeptides. Further, the specification does not 
describe the structure of a catalytically active portion of any or all MTSP polypeptide. 
Therefore, the specification fails to describe a representative species of the genus of 
polypeptides comprising of a serine protease domain or a catalytically active portion of a 
MTSP polypeptide. 

Given this lack of description of the representative species encompassed by the 
genus of the claims, the specification fails to sufficiently describe the claimed invention 
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in such full, clear, concise, and exact terms that a skilled artisan would recognize that 
applicants were in possession of the inventions of claims 1-3. 5. 9, 11, 19-20, 34-36, 40- 
42 and 113-114. 

Applicant is referred to the revised guidelines concerning compliance with the 
written description requirement of U.S.C. 112» first paragraph, published in the Official 
Gazette and also available at www.uspto.qov . 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

■ 

Applicants argue that the claims meet the written description guideline since the 
specification teaches common elements of MTSP and protease domains of MTSPs. 
thereby providing structural and functional characteristics of the various species. 
Applicants also argue that the specification explicitly provides several catalytically active 
portions of MTSP, SEQ ID N0:2. 4. 6 and 1 1 (MTSP1 . MTSP3. MTSP4 and MTSP 6), 
along with how to make other catalytically active fragments of MTSP. and therefore, the 
specification provides "relevant, identifying characteristics" of a representative number 
of species of the claimed genus. Examiner respectfully disagrees. The claims are 
drawn to polypeptides comprising any protease domains or any or all catalytically active 
fragments of said protease domains of any or all MTSP or any or all MTSP1 , including 
any or all recombinants, variants and mutants of said MTSP or MTSP1 . The claims are 
drawn to polypeptides having any structure and therefore, the claims are drawn to a 
genus encompassing species having substantial variation and fails to describe a 
representative number of species. As discussed in the written description guidelines. 
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the written description requirement for a claimed genus may be satisfied through 
sufficient description of a representative number of species by actual reduction to 
practice, reduction to drawings, or by disclosure of relevant, identifying characteristics, 
i.e., structure or other physical and/or chemical properties, by functional characteristics 
coupled with a known or disclosed correlation between function and structure, or by a 
combination of such identifying characteristics, sufficient to show the applicant was in 
possession of the claimed genus. A representative number of species means that the 
species which are adequately described are representative of the entire genus. Thus, 
when there is substantial variation within the genus, one must describe a 
sufficient variety of species to reflect the variation within the genus. Satisfactory 
disclosure of a representative number depends on whether one of skill in the art would 
recognize that the applicant was in possession of the necessary common attributes or 
features of the elements possessed by the members of the genus in view of the species 
disclosed. For inventions in an unpredictable art. adequate written description of a 
genus which embraces widely variant species cannot be achieved by disclosing only 
one species within the genus. In the instant case the claimed genera of the claims are 
drawn to species which are widely variant in structure. The genus of the claims are 
structurally diverse as it encompasses any catalytically active protease domains of any 
or all MTSP or MTSP1, excepting having serine protease activity. As such, neither the 
description of solely structural features present in all members of the genus is sufficient 
to be representative of the attributes and features of the entire genus. 
Hence the rejection is maintained. 



Application/Control Number: 09/776.191 



Art Unit: 1652 



Page 1 0 



Claims 1-3. 5. 9. 19-20. 34-36. 40-42 and 113-114 are rejected under 35 
U.S.C. 112, first paragraph, because the specification, while being enabling for a 
polypeptide comprising amino acids 615-855 of SEQ ID N0:2. does not reasonably 
provide enablement for a polypeptide comprising any protease domain of any type II 
membrane type serine protease (MTSP) or MTSP1 or a catalytically active portion 
thereof. The specification does not enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the invention 
commensurate in scope with these claims. 

Factors to be considered in determining whether undue experimentation is 
required are summarized in In re Wands 858 F.2d 731, 8 USPQ2nd 1400 (Fed. Cir. 
1988) . They include (1) the quantity of experimentation necessary. (2) the amount of 
direction or guidance presented, (3) the presence or absence of working examples. (4) 
the nature of the invention. (5) the state of the prior art. (6) the relative skill of those in 
the art, (7) the predictability or unpredictability of the art. and (8) the breadth of the 
claims. 

Claims 1-3. 5. 9. 19-20, 35-36, 40-42 and 113-114 are drawn to a polypeptide 
comprising a protease or catalytically active portion of type-ll membrane-type serine 
protease (MTSP) from any source. Claims 1 1 and 34 limit the MTSP polypeptide to a 
MTSP1 polypeptide from any source. Therefore, these claims are drawn to 
polypeptides having undefined structure. 

The scope of the claims is not commensurate with the enablement provided by 
the disclosure with regard to the extremely large number of polypeptides comprising a 
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protease or catalytically active domain broadly encompassed by the claims. Since the 
amino acid sequence of a protein determines its structural and functional properties, 
predictability of which changes can be tolerated in a protein's amino acid sequence and 
obtain the desired activity requires a knowledge of and guidance with regard to which 
amino acids in the protein's sequence, if any, are tolerant of modification and which are 
conserved (i.e. expectedly intolerant to modification), and detailed knowledge of the 
ways in which the proteins' structure relates to its function. However, in this case the 
disclosure is limited to the polypeptide comprising amino acids 615-855 of SEQ ID 
NO:2, or the amino acids of SEQ ID NO:50. 

It would require undue experimentation of the skilled artisan to make and use the 
claimed polypeptides. The specification is limited to teaching the use of polypeptide 
comprising amino acids 615-855 of SEQ ID N0:2 or the amino acids of SEQ ID NO:50 
but provides no guidance with regard to the making of variants and mutants or with 
regard to other uses. In view of the great breadth of the claim, amount of 
experimentation required to make the claimed polypeptides, the lack of guidance, 
working examples, and unpredictability of the art in predicting function from a 
polypeptide primary structure, the claimed invention would require undue 
experimentation. As such, the specification fails to teach one of ordinary skill how to 
use the full scope of the polypeptides encompassed by the claims. 

While enzyme isolation techniques, recombinant and mutagenesis techniques 
are known, and it is routine in the art to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims, the specific amino acid positions 
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within a protein's sequence where amino acid modifications can be made with a 
reasonable expectation of success in obtaining the desired activity/utility are limited in 
any protein and the result of such modifications is unpredictable. In addition, one skilled 
in the art would expect any tolerance to modification for a given protein to diminish with 
each further and additional modification, e.g. multiple substitutions. 

The specification does not support the broad scope of the claims which 
encompass all modifications and variants of a protease or catalytically active domain or 
modifications of amino acids 615-855 of SEQ ID N0:2 because the specification does 
not establish: (A) regions of the protein structure which may be modified without 
affecting MTSP/serine protease activity; (B) the general tolerance of MTSP to 
modification and extent of such tolerance; (C) a rational and predictable scheme for 
modifying any amino acid residue with an expectation of obtaining the desired biological 
function; and (D) the specification provides insufficient guidance as to which of the 
essentially infinite possible choices is likely to be successful. 

Thus, applicants have not provided sufficient guidance to enable one of ordinary 
skill in the art to make and use the claimed invention in a manner reasonably correlated 
with the scope of the claims broadly including protease or catalytically active domains of 
MTSP with an enormous number of amino acid modifications of the MTSP polypeptides 
and of amino acids 615-855 of SEQ ID N0:2. The scope of the claims must bear a 
reasonable correlation with the scope of enablement {In re Fisher, 166 USPQ 19 24 
(CCPA 1 970)). Without sufficient guidance, determination of the serine protease 
domain or the catalytically active domain of MTSP having the desired biological 
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characteristics is unpredictable and the experimentation left to those skilled in the art is 
unnecessarily, and improperly, extensive and undue. See In re Wands 858 F.2d 731. 8 
USPQ2nd 1400 (Fed. Cir. 1988). 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the level of skill in this art is high and the specification 
teaches structural and functional features sufficient to enable one of skill in the art to 
make sue the single chain polypeptides comprising catalytically active portion of an 
MTSP protease domain, by providing structure of MTSP polypeptides and their 
protease domains, as well as their conserved structures. Examiner respectfully 
disagrees. The scope of the claims, which are drawn to polypeptides comprising any 
protease domains or any or all catalytically active fragments of said protease domains 
of any or all MTSP or any or all MTSP1 , including any or all recombinants, variants and 
mutants of said MTSP or MTSP1 . is not commensurate with the enablement provided 
by the disclosure with regard to the extremely large number of polypeptides comprising 
a protease or catalytically active domain broadly encompassed by the claims. Even 
though the structure of some MTSP are known, the claims are drawn to any or all 
catalytically active fragments of any or all protease domains of any or all MTSP or 
MTSP1. As discussed above, predictability of which changes can be tolerated in a 
protein's amino acid sequence and obtain the desired activity requires a specific 
knowledge of and guidance with regard to which specific amino acids in the protein's 
sequence, can be modified such that the modified polypeptide continues to have said 
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claimed activity. It is this specific guidance that applicants do not provide. While the art 
may teach in general the structure of MTSP conserved amino acid sequences, protease 
domains, X-ray crystal structure and etc, such teachings will not reduce the burden of 
undue experimentation on those of ordinary skill in the art. 

Applicants argue that the specification discloses working examples, thus a 
person skilled in the art has sufficient guide in making the claimed polypeptides. 
Examiner respectfully disagrees. Even though the structure of some MTSP are taught, 
the claims are not only drawn to polypeptides comprising catalytically active fragments 
of only MTSP1, MTSP3. MTSP4 and MTSP6, but to any or all mutants, variants and 
recombinants of any MTSP. Without specific guidance, those skilled in the art will be 
subjected to undue experimentation of making and testing each of the enormously large 
number of mutants that results from such experimentation. While the art may teach in 
general the structure of MTSP, conserved amino acid sequences, and etc, such 
teachings will not reduce the burden of undue experimentation on those of ordinary skill 
in the art. 

Hence the rejection is maintained. 

Applicants argue that it would be unfair, unduly limiting and contrary to the public 
policy upon which the patent laws are based to require applicant to limit the instant 
claims to only one exemplified protease domain. This argument is moot since 
patentability is based on statutes under 35 USC 101, 1 12, 102 and/or 103. 
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Application/Control Number: 09/776,191 
Art Unit: 1652 



Page 15 



The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the Invention was known or used by others In this country, or patented or described In a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
states. 

(e) the invention was described In (1) an application for patent, published under section 122(b). by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the International application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 



Claims 1 -3. 5, 1 1 -1 3. 1 9-20. 34-36. 40-42 and 1 1 3-1 1 4 are rejected under 35 
U.S.C. 102(b) as being anticipated by Takeuchi et al. (see rejection of the phrase 
"MTSP protease domain or catalytically active fragment there is the only portion of the 
single-chain polypeptide from the MTSP" under 35 USC 112, 2"** paragraph above) 

Claims 1-3. 5. 11-13. 19-20 and 34 are drawn to a polypeptide comprising 
fragment consisting of a serine protease domain of MTSP having the characteristics 
recited in the claims. Claims 35-36 are drawn to a conjugate comprising a polypeptide 
comprising a serine protease domain of MTSP and a targeting agent. Claims 40 —42 
and 113-114 are drawn to a solid support comprising a polypeptide comprising a serine 
protease domain of MTSP. 

Takeuchi et al. (Reference IJ : PTO-1449) teaches a polypeptide comprising a 
fragment consisting of a serine protease domain that is 100% identical to amino acids 
615-855 of SEQ ID NO:2 of the instant invention (page 1 1060. 2"** full paragraph). 
Takeuchi et al. discloses a purified activated protease domain, comprising amino acids 
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615-855 of SEQ ID NO:2. confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057). The MTSP of Takeuchi et al. is not expressed on normal 
endothelia cells (page 1 1054, last paragraph and page 1 1055, 2^^^ full paragraph), is of 
human origin (Figure 1 ), consists essentially of the protease domain having catalytic 
activity (page 1 1060, 2"*^ full paragraph), and is expressed in tumor cells (page 1 1055, 
top paragraph). 

Takeuchi et al. teaches a catalytically active polypeptide comprising the serine 
protease domain linked to a His-tag (page 11055, 3^^ full paragraph, page 11057, 4'*^ full 
paragraph). Takeuchi et al. also teaches a solid support comprising said polypeptide 
(page 1 1057, 4th full paragraph and Figure 5). Therefore, the teaching of Takeuchi et 
al, anticipates claims 1-3, 5, 11-13, 19-20, 34-36. 40-42 and 113-114 are. 

Examiner notes that the contents of the reference were made public at the 
National Academy of Sciences colloquium held February 20-21 , 1999 (see top of 
reference). 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that Takeuchi et al. does not anticipate the instant claims 
because it fails to disclose any polypeptides that incorporate all the features of claim 1 , 
a single chain polypeptide having an MTSP portion, wherein the MTSP portion is a 
protease domain or a smaller fragment and wherein the MTSP portion has serine 
protease activity. 
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Applicants argue that the MT-SP1 of Takeuchi et al. is a full-length protein that 
includes additional MTSP regions other than a protease domain, and therefore, said 
MTSP1 of Takeuchi et al. is not a polypeptide where the only MTSP portion of the 
polypeptide is a protease domain or a smaller catalytically active portion of the protease 
domain. Examiner respectfully disagrees. First, the claim recites "a polypeptide 
comprising a MTSP portion" and the claim does not recite the limitation that the 
polypeptide only consist of MTSP portion. Therefore, a full-length MT-SP1 of Takeuchi 
et al. anticipates the instant claims. Second, in addition to the full-length MT-SP1 . 
Takeuchi et al. also discloses a purified activated protease domain, comprising amino 
acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057). Even applicants state that Takedeuchi et al. discloses "that its 
protease domain has an amino acid sequence containing amino acids 615-855 
(Remarks page 36) and that "Takeuchi et al. discloses that its polypeptide includes the 
pro-domain and that the pro-domain is cleaved during auto-activation, resulting in a 
protease domain" (page 37). Therefore, said purified, activated protease domain 
anticipates the instant claims. 

Applicants also argue that the reference of Takeuchi et al. does not anticipate the 
instant claims because the "purified protease domain" of Takeuchi et al. includes the 
His-tag sequence and that the polypeptide construct disclosed by Takeuchi et al. 
includes a sequence of 19 amino acids of a portion of the pro-domain and that his pro- 
domain is disulfide bonded to the protease domain. Examiner respectfully disagrees. 
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Takeuchi et al. also discloses a purified activated protease domain, comprising amino 
acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057 and Figure 6). Further, applicants state that "Takeuchi et al. 
discloses that its polypeptide includes the pro-domain and that the pro-domain is 
cleaved durino auto-activation , resulting in a protease domain" (page 37). 

Applicants also argue that the activated protein derived from the expressed His- 
tag amino acids 596-855 of MT-SP1 of Takeuchi et al. is not a single chain polypeptide 
because the protease domain is disulfide bonded to a pro-doiamin resulting in a two 
chain form. Examiner respectfully disagrees. Takeuchi et al. discloses that the pro- 
domain is disulfide bonded to a protease domain of the full length protein. Contrary to 
applicants argument, Takeuchi et al. does not teach that the pro-domain is disulfide 
bonded to an activated protease domain. Further, a single chain polypeptide is one 
sequence of amino acids beginning with a carboxyl end and terminating with an amino 
end, wherein the amino acids are connected via peptide bonds. Therefore, even the full 
length MT-SP1 of Takeuchi et al. having disulfide bonds can be construed as a single 
chain polypeptide. 

In conclusion, Takeuchi et al. discloses a purified activated protease domain, 
comprising amino acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence 
of the purified, activated protease domain yielding the expected WGGT sequence 
(Figure 3 and right column on page 1 1057 and Figure 6). Further, applicants state that 
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"Takeuchi et al. discloses that its polypeptide includes the pro-domain and that the pro- 
domain is cleaved during auto-activation , resulting in a protease domain" (page 37). 
Hence the rejections are maintained. 



Claim Rejections - 35 USC § 102/103 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described In (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

The following is a quotation of 35 U.S.C. 103(a), which forms the basis for all 
obviousness rejections, set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior 
art are such that the subject matter as a whole would have been obvious at the time the invention was made to 
a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 



Claims 1-3. 5, 10-13 and 34 rejected under 35 U.S.C. 102(e) as anticipated by 
or, in the alternative, under 35 U.S.C. 103(a) as obvious over O'Brien et al. 

Claims 1-3. 5, 10-13 and 34 are drawn to a polypeptide comprising a serine 
protease domain of MTSP. 

O'Brien et al. (U.S. Patent No. 5.972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention (SEQ ID NO:2. columns 19-24). The properties recited in claims 2-3 are 
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inherent properties of MTSP1 taught by O'Brien et aL since the polypeptide of O'Brien 
et al. and the instant invention have identical structure and therefore identical 
properties. 

O'Brien et al. teaches a serine protease domain having proteolytic activity that is 
100% Identical to amino acids 615-855 of SEQ ID N0:2 (Figure 2. Figure 10 and SEQ 
ID NO: 14). Although the protease domain of O'Brien et al. identified by SEQ ID NO: 14 
has not been purified, the protease domain in the reference and the polypeptide claimed 
by the applicants are one and the same. Therefore, the protease domain anticipates 
the instant invention. 

Since the Office does not have facilities for examining and comparing applicant's 
polypeptide with the polypeptide of the prior art, the burden is on the applicant to show a 
novel or unobvious difference between the claimed product and the product of the prior 
art (i.e., that the polypeptide of the prior art does not possess the same material 
structure and functional characteristics of the claimed polypeptide). See In re Best, 562 
F.2d 1252, 195 USPQ 430 (CCPA 1977) and In re Figzgeraldet aL, 205 USPQ 594. 

Alternatively, O'Brien et al. teaches a method of expressing polypeptides via a 
vector in host cells. O'Brien et al. also teaches that the protease domain could be 
released the used as a diagnostic which has the potential for a target for therapeutic 
intervention (Column 15. lines 35-38). Therefore, it would have t>een obvious to one 
having ordinary skill in the art at the time the invention was made to express the 
protease domain of SQ ID NO: 14 and purify the polypeptide. The motivation of making 
such a polypeptides is to use it as a diagnostic which has the potential for a target for 
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therapeutic intervention. One of ordinary skill in the art would have had a reasonable 
expectation of success since expression of a heterologous polypeptide is routine in the 
art and O'Brien et al. teaches how to express heterologous polypeptides. 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that O'Brien et al. does not anticipate any of the instant claims 
because the claims are not directed to a full-length MTSP polypeptide. Examiner 
respectfully disagrees. The claim recites "a polypeptide comprising a MTSP portion" 
and the claim does not recite the limitation that the polypeptide only consist of MTSP 
portion. Therefore, the full-length MT-SP1 of O'Brien et al. anticipates the instant claims. 

Applicants also argue that one of skill in the art would recognize the disclosure of 
the polypeptide of O'Brien as not disclosing a single chain polypeptide. Examiner 
respectfully disagrees. A single chain polypeptide is one sequence of amino acids 
beginning with a carboxyl end and terminating with an amino end, wherein the amino 
acids are connected via peptide bonds. Therefore, the full length MT-SP1 of O'Brien et 
al. can be construed as a single chain polypeptide. 

Applicants argue that one of skill in the art would understand MTSP serine 
proteases to be active only as two chain polypeptides by citing Lu et al. (1999) J. Biol, 
Chem, 272:31293-300 and would not view O'Brien et al. as disclosing a single chain 
polypeptide. Examiner respectfully disagrees. The bibliographi information Lu et al. 
(1999) J. BioL Chem. 272:31293-300 could not be located through J. Bioi Chem, 
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Applicants are urged to supply the reference or the correct bibolographic information. 
Nevertheless, applicants state that "as expressed, the MTSP polypeptide is an inactive 
single-chain zymogen" (Remarks page 42). Therefore, according to applicants, the full 
length MT-SP1 of O'Brien et al. is a single chain polypeptide and therefore, anticipates 
the claimed invention. 

Hence the rejection is maintained. 

Applicants also argue that O'Brien et al. provides no teaching or suggestion of 
smaller fragments having serine protease activity because it does not teach how to 
make a single chain polypeptide that has serine protease activity. Examiner respectfully 
disagrees. O'Brien et al. teaches a method of expressing polypeptides via a vector in 
host cells. It is well within the skill available in the art to purify the protease domain 
since O'Brien et al. identifies the protease domain. Therefore, it would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
express the protease domain of SQ ID NO: 14 and purify the polypeptide. The 
motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 

Applicants again argue that at the time of filing the instant application, one of skill 
in the art would not have had a reasonable expectation of success to express the 
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protease domain because art evidences that a single-chained polypeptide would not 
have been expected to have protease activity. Examiner respectfully disagrees. The 
claims are drawn to a polypeptide comprising a fragment consisting of a protease 
domain of SEQ ID N0:2. Therefore, said polypeptide being a single-chained 
polypeptide is an inherence property of said polypeptide since two polypeptides having 
identical structure will have identical function and physical and chemical properties. 
Hence the rejections are maintained. 

Claims 35-36. 40-42 and 113-114 are rejected under 35 U.S.C, 103(a) as being 
unpatentable over O'Brien et al. 

Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 

» 

serine protease domain of MTSP and a targeting agent. Claims 40-42 and 113-114 are 
drawn to a solid support comprising a polypeptide comprising a serine protease domain 
of MTSP. 

O'Brien et al. (U.S. Patent No. 6,972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID N0:2 of the instant 
invention, as discussed above. O'Brien et al. also teaches that the protease domain 
could be released the used as a diagnostic which has the potential for a target for 
therapeutic intervention (Column 1 5. lines 35-38). 

O'Brien et al. also teaches method of making fragments of SEQ ID N0:2 
(Column 9. lines 22-55). O'Brien et al. teaches said fragments linked to another 
polypeptide (Column 9. lines 54-55) and conjugated to bridging molecules (Column 6, 
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lines 27-39) for detecting the polypeptide. Assays using polypeptides linked to the 
molecules taught by O'Brien et al. utilize solid supports. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to make a polypeptide comprising of the 
serine protease domain of SEQ ID NO:2 taught by O'Brien et al. and to make 
conjugates and solid support comprising of a polypeptide comprised of the serine 
protease domain of SEQ ID NO:2. The motivation of making such a polypeptides is to 
use it as a diagnostic which has the potential for a target for therapeutic intervention. 
The motivation of making conjugates and solid supports comprising of said polypeptide 
is to use the conjugate and solid support in a variety of diagnostic assays. One of 
ordinary skill in the art would have had a reasonable expectation of success making 
fragments of a polypeptide is routine in the art and O'Brien et al. teaches how to make 
fragments of SEQ ID NO:2. One of ordinary skill in the art would have had a 
reasonable expectation of success in diagnostic assays using conjugates and solid 
supports comprising a polypeptide is very well known, as taught by O'Brien et al. 

Therefore, the above references render claims 35-36 and 40-42 pnma facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the teachings of O'Brien et al. does not result in the 
instantly claimed compositions because O'Brien et al. does not teach or suggest a 
single chain polypeptide that includes a MTSP protease domain where the polypeptide 
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does not include any additional MTSP portions and the polypeptide has serine protease 
activity. O'Brien et al. does teach or suggest a single chain polypeptide comprising a 
MTSP portion, wherein the MTSP portion is a protease domain and wherein the MTSP 
portion has serine protease activity and wherein the MTSP portion is the only portion of 
the polypeptide because O'Brien et al. identifies the serine protease domain and one 
having ordinary skill in the art at the time the invention was filed would have been 
motivated to purify the serine protease domain of O'Brien et al. as discussed iabove. 
Hence the rejection is maintained. 

Claims 19-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
O'Brien et al. and Estell et al. in view of Takeuchi et al. 

Claims 19-20 are drawn to a polypeptide comprising the serine protease domain 
of a MTSP wherein free Cys residues are substituted with Ser residues. 

O'Brien et al. teaches a serine protease domain of a MTSP polypeptide, as 
discussed above. 

The reference of O'Brien et al. does not teach a serine protease domain of a 
MTPSP polypeptides wherein free Cys residues have been replaced with Ser residues. 

It is well known in the art that proteins form disulfide bonds via the SH groups of 
Cys residues. Upon making a polypeptide comprising a serine protease domain, a Cys 
residue which normally forms disulfide bonds in the full length polypeptide may be left 
free. For example. Takeuchi et al. (Reference IJ : PTO-1449) teaches that Cysteine at 
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position 731 of SEQ ID NO:2 normally forms a disulfide bond with a Cys residue in the 
pro-protease domain (see page 1 1060, top left paragraph and Figures 1 and 2). 

Cys residues are sensitive to oxidation due to their SH side group. Estell et al. 
(U.S. Patent No. 5,346»823) teaches that Cys residues replaced with Ser residues to 
decrease a polypeptide's susceptibility to oxidation (Abstract and Column 10, lines 34- 
38). Ser residues have similar side chains as Cys residues and substitution of a Cys 
residue with a Ser residue is a conservative substitution. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to replace free Cys residues in the protease 
domain taught by O'Brien et al. with a Ser residue. One of ordinary skill in the art would 
be motivated to make such a change in order to enhance stability of the polypeptide. 
One of ordinary skill in the art would have had a reasonable expectation of success 
since Estell et al. teaches successful decrease of a protein's susceptibility to oxidation 
by substituting residues sensitive to oxidation with conservative substitutions. 

Therefore, the above references render claims 1 and 16, 18-20, 34 and 137 . 
pnma facie obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the combination of the teachings of O'Brien et al. with 
the teachings of Estell et al.. and Takeuchi et al. does not result in the instantly claimed 
methods because O'Brien et al. does not teach or suggest a single chain polypeptide 
that includes a MTSP protease domain where the polypeptide does not include any 
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additional MTSP portions and the polypeptide has serine protease activity and that 
neither Takeuchi et al. nor Estell et al. remedy the defects of O'Brien et al. First, the 
claims are product claims and not method claims. Second, O'Brien et al. does teach or 
suggest a single chain polypeptide comprising a MTSP portion, wherein the MTSP 
portion is a protease domain and wherein the MTSP portion has serine protease activity 
and wherein the MTSP portion is the only portion of the polypeptide because O'Brien et 
al. identifies the serine protease domain and one having ordinary skill in the art at the 
time the invention was filed would have been motivated to purify the serine protease 
domain of O'Brien et al. as discussed above. 

Applicants argue that Takeuchi et al. teaches that every cysteine residue of the 
protein is disulfide bonded and therefore Takeuchi eta I. does not teach or suggest an 
MTSP protease domain having a free Cys residue. Examiner respectfully disagrees. 
Figure 4 applicants are referring to illustrate disulfide bonds of cysteine residues of the 
full length MTSP, for example, the Cys at position 830 is disulfide bonded to Cys at 
position 191. 

Hence the rejections are maintained. 



None of the claims are in condition for allowance. 
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Any inquiry cx)ncerning this communication or earlier communications from the 
examiner should be directed to Yong Pak whose telephone number is 671-272-0935. 
The examiner can normally be reached 6:30 A.M. to 5:00 P.M. Monday through 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Ponnathapu Achutamurthy can be reached on 571-272-0928. The fax 
phone numbers for the organization where this application or proceeding is assigned 
are 571-273-8300 for regular communications and 703-872-9307 for After Final 
communications. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 571-272- 



Yong D. Pak 

Patent Examiner 1 652 
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